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Preface 



This volume presents the proceedings of the 2nd International Workshop on Al- 
gebraic Frames for the Perception and Action Cycle. AFPAC 2000. held in Kiel, 
Germany, 10-11 September 2000. The presented topics cover new results in the 
conceptualization, design, and implementation of visual sensor-based robotics 
and autonomous systems. Special emphasis is placed on the role of algebraic 
modelling in the relevant disciplines, such as robotics, computer vision, theory 
of multidimensional signals, and neural computation. The aims of the workshop 
are twofold: first, discussion of the impact of algebraic embedding of the task 
at hand on the emergence of new qualities of modelling and second, facing the 
strong relations between dominant geometric problems and algebraic modelling. 

The first workshop in this series, AFPAC’97. inspired several groups to ini- 
tiate new research programs, or to intensify ongoing research work in this field, 
and the range of relevant topics was consequently broadened. The approach 
adopted by this workshop does not necessarily fit the mainstream of worldwide 
research-granting policy. However, its search for fundamental problems in our 
field may very well lead to new results in the relevant disciplines and contribute 
to their integration in studies of the perception-action cycle. 

The background of the workshop is the design of autonomous artificial 
systems following the paradigm of behavior-based system architectures. The 
perception-action cycle constitutes the framework in which the designer has 
to make sure that robust, stable, and adaptive system behavior will result. The 
mathematical language used to shape this frame is crucial for getting system fea- 
tures such as the ones mentioned above or, in addition, semantic completeness 
and in some cases linearity. By semantic completeness we mean a representa- 
tion property which is purpose-oriented in its nature rather then the traditional 
mathematical meaning of the term of completeness. While linearity is, without 
any restriction, a useful system property, most of the problems we have to handle 
turn out to be nonlinear. We learn from the approach of this workshop that this 
is not a matter of fate which traditionally results in non-complete, approximating 
linearizations. Instead, various problems can be algebraically transformed into 
linear and, thus, complete ones. The reader can identify this approach in several 
contributions related to multidimensional signal processing, neural computing, 
robotics, and computer vision. 

This volume includes 7 invited papers and 20 regular papers. The invited 
papers are contributed by members of the program committee. Regretably, not 
all of them were able to present a talk or to contribute a paper to the proceed- 
ings. We wish, however, to thank all of them for their careful reviewing of the 
contributed papers. All authors of papers presented in this volume contributed 
to important aspects relevant to the main theme of the workshop. Our thanks 
go to all the authors of the invited and contributed papers for the high quality 
of their contributions and for their cooperation. 
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We thank the Christian-Albrechts-Universitt Kiel for hosting the work- 
shop and the industrial sponsors for their financial support. Special thanks 
to the Deutsche Forschungsgemeinschaft (DFG) which, by awarding grant no. 
4851/223/00, made it possible to invite selected speakers. Last but not least the 
workshop could not have taken place without the extraordinary commitment of 
the local organizing committee. 
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Analyzing Action Representations 
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Abstract. We argue that actions represent the basic seed of intelligence 
underlying perception of the environment, and the representations en- 
coding actions should be the starting point upon which further studies 
of cognition are bndt. In this paper we make a hrst effort in characteriz- 
ing these action representations. In particular, from the study of simple 
actions related to 3D rigid motion interpretation, we deduce a number 
of principles for the possible computations responsible for the interpre- 
tation of space- time geometry. Using these principles, we then discuss 
possible avenues on how to proceed in analyzing the representations of 
more complex human actions. 



1 Introduction and Motivation 

During the late eighties, with the emergence of active vision, it was realized that 
vision should not be studied in a vacuum but in conjunction with action. Adopt- 
ing a purposive viewpoint makes visual computations easier by placing them 
in the context of larger processes that accomplish tasks. This new framework, 
known by a variety of names such as active, purposive, animate or behavioral 
vision, has contributed a wealth of new technical results and has fueled a vari- 
ety of new application areas. Several groups constructed active vision systems 
and studied the basics of many visual competences, but for the most part, this 
study of the perception/action coupling concentrated on problems related to 
navigation. A flurry of activity produced a veritable cornucopia of algorithmic 
approaches to 3D motion estimation, 3D shape recovery, tracking, and motion 
and scene segmentation. As a matter of fact, one can say that this is the most ma- 
ture area in the held of computer vision. This was, perhaps, not surprising since 
all these problems are related to the most basic action of all, one that transcends 
all systems with vision, that of self-motion; although the perception/action cy- 
cle surrounding the basic action of self-motion is not yet fully understood (we 
will discuss later some of the intricacies involved), it is clear that the problem 
is one of geometry and statistics awaiting the right modeling for its complete 
resolution. But an intelligent system with perception is not just a system that 
understands its movement, and the shape and movement of other objects in 
its held of view. It has a large number of capabilities related to recognition. 
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reasoning, planning, imagination and learning. Despite the fact that the active 
vision framework was embraced as the right tool in dealing with the recovery 
of 3D information from image sequences, that was not the case when it came 
to higher-level problems such as recognition, planning and reasoning. Lacking 
a foundation, not much progress has been made in those higher-level problems. 
For example, with the exception of simple objects in controlled environments, 
not much progress has been made in the held of object recognition. All new 
proposals during the past decade have followed some advance in the structure 
from motion problem.^ There are, of course, many reasons for this, and some of 
them will become clear in the course of our exposition. 

The goal of this paper is twofold, one methodological and the other technical. 
Our intent is to put forward the thesis that action represents the basic seed 
of intelligence, underlying perception of our environment, which is our major 
interest, recognition and other high-level processes such as communication. If 
this is the case, action representations are very basic components of an intelligent 
system. What are they? What could their nature and kind be? How can we build 
them? What does it mean to understand them? Most important, what techniques 
should we employ in order to obtain them? Answering these questions is our 
second goal and we approach it using only computational arguments. Before 
we begin, we need to discuss other efforts attempting to answer such kinds of 
questions. 

2 Intelligence, Learning and the Stratification of Reality 

Slowly but steadily, the paradigm that has dominated the study of cognition and 
the helds surrounding it for the past century is starting to be challenged. The 
dominant approach has been to view mental states as having meaning in terms of 
symbol systems like natural language. The emerging new approach is referred to 
as embodied intelligence or the sensorimotor theory of cognition which proposes 
that higher cognition makes use of the same structures as those involved in 
sensorimotor activity. This view is highly exemplihed in recent efforts in AI to 
achieve intelligence through building robots and having them “develop” through 
interaction with their environment. Although there exist interesting points in 
the contemporary sensorimotor theories of cognition, and the projects of AI 
engineers have merits of their own, there exist two fundamental problems with 
them that we explain here. 

The belief that smaller systems develop through interaction with their envi- 
ronment and become integrated to form new and more advanced systems owes 
its appeal to its intended mimicking of evolution. It makes, however, an implicit 
assumption, namely that evolution is evolution in degree and not evolution in 
kind. By evolution in kind we mean the introduction of new components having 

^ When it became clear that 3D information could be extracted from point and hue 
correspondences, researchers concentrated on pose recovery in the polyhedral world. 
When view interpolation was achieved for some cases, recognition approaches con- 
sidering objects as collections of 2D views appeared. 
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qualitatively diifereiit characteristics from the existing components. This leads 
to a strange kind of one-way relationship, both between the whole and its con- 
stituent parts within the system, and between higher systems and their primitive 
ancestors. One can dehne this relationship by saying that the whole is its parts, 
and continues to be so even if, as the result of the introduction of new subsys- 
tems (kind), it acquires a number of additional characteristics in the course of 
its evolution. The subsidiary systems themselves do not gain any higher charac- 
teristics and may even lose some in the process of simplihcation. The one-way 
relationship consists in the system as a whole possessing all the characteristics 
of its component parts, but none of these parts possesses the characteristics spe- 
cihc to the whole. Similarly, every system possesses most of the characteristics 
of its primitive ancestors, but even complete knowledge of a system’s charac- 
teristics will not make it possible to predict those of its more highly developed 
descendants [17]. Thus, we can explain a system only if we accept the present 
structures in its body as our data.^ 

The second problem, related to the hrst, is best explained by utilizing the 
ideas of the philosopher Nicolai Hartmann [15]. Like all processes in life, those 
of acquiring and storing relevant information take place on many diiferent levels 
and are interlinked at many points. The world in which we live, according to 
Hartmann, is built of diiferent strata, each with its own existential categories that 
distinguish it from other strata. “There are in the hierarchy of existence certain 
phenomenal realities whose fundamental differences our minds fail to bridge .... 
Any true and accurate theory of categories must have as much regard for these 
gaps as for the existential relationships that bridge them.” These relationships, 
however, only transcend in a unilateral manner the divisions between the four 
great strata of existence — the inorganic, organic, cognitive and conscious. The 
principles of existence and laws of nature that govern inorganic matter apply with 
equal validity to higher strata. But Hartmann insists that the differences between 
the higher and lower strata are far from restricted to the distinctions between 
inorganic, organic, cognitive and conscious. “The higher elements that make up 
the world are stratihed in a similar way to the world itself,” he wrote. This 
means that each step leading from a system of a lower order to one of a higher 
order is the same in nature and complexity as the coming into existence of life 
itself. This viewpoint about the stratihcation of existence can guard us against 
common mistakes in studying cognition. If we neglect the categories and laws 
that are exclusive properties of the higher strata or even deny their existence, 
we commit the fallacy of an upward transgression of the limits imposed upon us 
by the stratihcation of the real world. It is not possible to explain the laws and 
processes proper to higher levels in terms of categories derived from lower levels. 
Another common mistake results from transgressing the boundaries between 
levels of existence in the opposite direction. This is described by Hartmann 



^ Irrational Residue: The number of historical causes one would need to know to fully 
explain why an organism is as it is may not be inhnite but it is sufficiently great 
to make it impossible to trace ail causal chains to their end. As Polanyi pointed 
out [19], a higher animal cannot be reduced to its simpler ancestors. 
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as follows: The base of the entire world picture is then chosen on the level of 
conscious experience — the level on which man experiences his own subjective 
life — and from there the principle is extended downwards to the lower level of 
reality. 

What does this all mean for us who wish to gain an understanding of per- 
ceptual mechanisms and, in particular, how action representations make up an 
intelligent system? There are several lessons we can obtain. It is a mistake, while 
seeking unitary explanatory principles behind the world, to try to explain lower 
and more primitive systems on the basis of principles applicable only to higher 
systems and vice versa. Similarly, the attempts of some psychologists and be- 
havioral scientists to achieve intelligence through learning are at fault. This is 
true for both primitive systems that are incapable of it and systems of more 
advanced organisms which are not only incapable of being modihed by learning 
but whose phylogenetic program makes them resistant to all modihcation. 

The only chance at understanding intelligent systems seems to be to study 
them by accepting their present structures as our data, and work towards under- 
standing the diiferent components. But here is the tricky part. The components 
we study should be such that they do not require other components in order to 
be understood, or if they do, all interrelated components should be studied to- 
gether. Otherwise it appears that there is no hope for understanding. And hnally, 
our only tools should be the ones of the physical sciences, geometry, statistics 
and computation. 



3 Understanding Actions, Objects 

Our task is to increase our understanding of how to piece together the compo- 
nents of an intelligent system with vision. Our thesis is that action is fundamental 
in this regard. It is of course irresistible to attempt to arrive at a unitary world 
image and try to explain the diversity of the world (and intelligent systems) on 
the basis of ontological and phenomenal principles of one single kind. Accord- 
ing to our previous analysis this would be a mistake after all. But our goals 
are much more modest. We want to uncover principles underlying the workings 
of the perceptual system and in particular shed light on the nature of action 
representations. 

Our basic thesis is that the content of the mind is organized in the form of a 
model of the world. That model contains objects and their relationships, events 
and the interaction of the system with its environment. In building, maintaining, 
manipulating and working with that model we should distinguish three items: 
the world itself, the system (its body) and the system’s interaction with the 
world. What could the best candidate be for building a foundation on the basis 
of which this complex model can be understood? 

Recall that it is trivially true that understanding a thing is relating it to 
something already understood; if we continue along this chain, this means that 
there must be something understood in terms of itself. This entity must be 




Analyzing Action Representations 5 



something with which a system is intimately familiar in a non-symbolic way. We 
conclude that the best candidate for this is the agent’s own activity. 

There is a number of basic actions that are the basis of acquiring models 
of the shape and motion of the environment. These actions are related to the 
system’s understanding of its own motion and they are universal, as all systems 
must have them in some form. The next sections are devoted to the computa- 
tional mechanisms underlying the coupling of such actions with perception. By 
taking a look at these actions, on the one hand we can give an example and 
elaborate on what it means to understand these actions. On the other hand, 
since the computations involved in these basic actions form the foundation for 
many other computational processes we gain insight into the nature of space- 
time representations. That is, it allows us to deduce a number of principles on 
what the shape and motion models of the environment might be and how they 
could be computed. The computational principle of feedback loops is emerging 
as a foundational theme. 

Next we are interested in the very large repertoire of more complex actions 
through which an agent interacts with his environment. Since understanding 
these actions should serve as a foundation for all other understanding, we should 
not presuppose understanding of other things. We merely assume that the agent 
understands its own body. Equipped with this knowledge and the foundation 
of space-time perception delivered from basic actions, we can provide a char- 
acterization of action as: One understands an aetion if one is able to imagine 
performing the aetion with images that are suffieient for serving as a guide in 
aetual performanee. This characterization provides two things. First, it shows 
that it is the ability to imagine an action which constitutes understanding it. 
Imagining it amounts to representing the constitutive components of the action 
as such, in a form that is capable of guiding the performance of the action. 
Because the components of the action form the content of the agent’s mental 
state, the agent can be said to understand it. Second, this sort of understanding 
produces the feeling of conhdence that one understands an action, because the 
understanding consists in a mental state of which the agent can be aware. Mov- 
ing along this line of thought, understanding anything amounts to knowing the 
possible actions one might perform in relation to that thing and being aware of 
that understanding is consciously imagining those actions. Thus, we can say that 
one has an understanding of a perceived object if one is able to imagine incorpo- 
rating the object into the performance of an action that is already understood, 
with an image sequence sufl&cient for serving as a guide in actual performance. 

Thus, the ability to intentionally perform an action, manifested in the ability 
to imagine performing it, is a way of understanding the action that presupposes 
no other understanding. Similarly, an object that is intentionally incorporated 
into the performance of an (understood) action is itself understood in relation to 
that action. In moving from the understanding of actions to the understanding 
of objects we have crossed an important boundary. Understanding actions does 
not require a concept of the objective external world; that is how it can be a 
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foundation for such a concept. But understanding objects does require such a 
concept. 

It is not our intention to delve further into a philosophical inquiry about 
understanding objectivity and the mind/world distinction. Our purpose here is 
to achieve an understanding of action representations. Having a crisp character- 
ization of what it means to understand an action we will be ready in Sect. 4 
to investigate constraints on such representations. We will not delve into ob- 
ject recognition issues but the framework is in place to carry this investigation 
further into higher-level problems. 

But remaining truthful to the principles of embodied intelligence, we cannot 
study in a technical sense actions unless we commit ourselves to a particular 
body. Of course the same is true for the basic actions and the perception of shape 
and motion, but in that case the dependence lies mostly in image acquisition 
(eye design) and computational capacity; in our days of extreme computational 
power these issues are not that important. But when it comes to higher-level 
actions and interaction with the world, it is important to choose a body. A 
failure to do so will turn our study completely philosophical. As such a body, we 
choose the human body. 

4 Basic Actions 

The most basic actions are the ones which can be based on the most simple model 
of 3D motion, and this is the model of instantaneous rigid motion. As living 
organisms move through the environment their eyes undergo a rigid motion. The 
visual system has to recognize and interpret this rigid motion from the visual 
input, that is, the sequence of images obtained by its eyes. This immediately 
gives rise to two basic actions: hrst, the capability of estimating one’s self-motion 
or egomotion; second, the detection of objects in the static scene which move 
themselves, that is, the detection of independently moving objects. Similarly, the 
motion of various objects is rigid or can be closely approximated by one. Thus 
a third action arising from this model is the estimation of rigid object motion. 
This action, however, is much more difl&cult than the estimation of self-motion. 
One reason is that objects usually cover a small held of view and thus less 
data is available for the estimation. Another reason is that objects lie outside 
the system’s body and thus there is less information from other sensors. In 
contrast, for self-motion estimation biological systems — in addition to vision — 
also use inertial sensors (such as those in our ears). As a result, most systems 
only possess the capability to partially estimate rigid object motion. Even rather 
simple organisms understand the approaching of the enemy, or can estimate 
how long it will take to intercept an object (the estimation of time to contact). 
A somewhat more sophisticated but still partial estimation would be a rough 
estimate of the object’s translation or rotation. 

The visual input is a sequence of images, a video. Aside from how the images 
are formed (that is, the properties of the eye or camera), the images depend on 
the spatiotemporal geometry, that is, the layout of the scene and the relative 3D 
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motion between observer and scene, and the optics, that is, the light sources and 
the reflectance properties of the surfaces. By considering only changes of image 
patterns, an image representation of the movement of the 3D scene points is 
obtained. This representation, a vector held, ideally is only due to the geometry 
that is the relative 3D motion between the eye and the scene and the shape of 
the scene. If the relative 3D motion between the observer and parts of the scene 
is rigid, then segmentation amounts to the localization of those parts of the how 
held corresponding to the same rigid motion and motion estimation amounts 
to decoding the 3D motion parameters. The shape of the scene determines the 
remaining components of the vector held and it can be decoded after motion 
estimation. 

The interpretation of motion information, probably because of its well-dehned 
geometric nature, has received a lot of attention in the computer vision literature. 
A methodology emerged under the hood of which most studies were conducted, 
which dehned rigid motion interpretation as the “structure from motion” prob- 
lem and it was considered to be carried out with three consecutive computational 
steps. First the exact movement of image points is estimated, either in the form 
of the optical how held if images densely sampled in time are considered, or in 
the form of correspondences of single image points if images further apart in 
time are used. The optical how held or correspondences are approximations of 
the projections of the 3D motion held which represents the movement of scene 
points. The second computational step consists in estimating the 3D rigid mo- 
tion from the optical how held, and the third step amounts to estimating the 
3D structure of the scene usually from multiple how helds using the estimates 
of the 3D motion. 

With this framework it has been considered that exact reconstruction of 
space-time geometry is possible — it is just a matter of using the right sophisti- 
cated tools — and it can be addressed in a purely bottom-up manner computa- 
tionally. Wedl argue that this cannot be. Throughout this and the next section 
wedl provide computational arguments supporting our view and elaborate on 
what the computations should be. 

First, optical how or correspondence cannot be estimated accurately using 
only the image data, that is, before performing estimations about the 3D geome- 
try. The local image data dehnes only one component of the image velocity. This 
is the so-called aperture problem illustrated in Fig. 1: Local image data usually 
only provides one-dimensional spatial structures. From the movement of these 
linear features between image frames the movement of single image points cannot 
be determined, but only the how component perpendicular to the linear feature; 
this is the so-called normal how. In order to estimate the two-dimensional image 
how, a model of the how held is needed; then normal measurements in different 
directions within local neighborhoods can be combined. Here lies the problem: 
In order to model the how held, additional information is necessary. It is easy 
to model the parts of the image which correspond to smooth scene patches by 
using smoothness assumptions of different kinds, but there are discontinuities 
in the how held, and their locations must be known prior to modeling the how 
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Fig. 1. Aperture problem: (a) Line feature observed through a small aperture at time t. 
(b) At time t + St the feature has moved to a new position, it is not possible to 
determine exactly where each point has moved to. From local measurements only the 
flow component perpendicular to the hne feature can be computed 



field. The discontinuities in the flow held are due either to independently mov- 
ing objects in the scene or to scene surfaces at different depths. On the other 
hand, the discontinuities cannot be detected unless an estimate of the how held 
is available. Thus the problems of estimating the how held and locating the dis- 
continuities are inherently coupled and the situation seems to present itself as a 
chicken and egg problem. Discontinuities provides an even more severe problem 
than usually considered. Most of the literature considers the idealized problem 
of estimating the motion for one rigidly moving observer in a static environ- 
ment. Thus all the discontinuities they deal with are due to depth. Even if the 
locations of the discontinuities in the how held are known, a system would need 
additional information about the 3D motion and scene to distinguish the motion 
discontinuities from the pure scene discontinuities. There also has been a lot of 
work in the literature on motion segmentation, but these studies consider the 
problem separate from 3D motion estimation: either they attempt a localization 
of the discontinuities on the basis of how held information only, or they assume 
the 3D motion to be known. It is clear that for a real system the two problems of 
3D motion estimation and independent motion detection have to be considered 
together. 

Besides the problem due to discontinuities there are also statistical difhculties 
with the estimation of optical how. As will be elaborated on later, the estimation 
of optical how from noisy normal how measurements even within neighborhoods 
of smooth how is very difhcult. To avoid bias and obtain very accurate how, 
theoretically one would need detailed models of the how held, and these in 
turn can only be obtained from 3D information; specihcally, this means the 
discontinuities, the shape of the scene and the 3D motion. 

Since accurate optical how estimation bottom up seems to be infeasible, 
the question is, is it possible to perform 3D motion estimation using as input 
normal how measurements? In the past we have studied this question, and we 
have developed two constraints that allow to extract 3D geometry directly how 
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normal flow. One constraint that relates image motion to 3D motion only, and 
a second one which also involves the scene. 

5 Relating Image Measurements to 3D Geometry 

Consider a system moving with rigid motion in a static environment. The motion 
is described by an instantaneous translational velocity t and a rotational velocity 
w. Each scene point R = (X, Y, Z) measured with respect to a coordinate system 
OXY Z hxed to the nodal point of the system’s eye where Z is the optical axis 
has velocity R = — t — w x R. 

We consider the image to be formed on a plane orthogonal to the Z axis 
at distance / (focal length) from the nodal point. Thus through perspective 
projection image points r = [x, y, /] are related to scene points R by r = fl, 
where z is a unit vector in the direction of the Z axis. The projection of the 3D 
motion held on the image gives the image motion held 

1 

R z) 

where Z is used to denote the scene depth (R-z) and Utr, Upot are the direction of 
the component of the how due to translation and the component of the how due 
to rotation, respectively. Due to the coupling of Z and Utr in this relationship 
only the direction of translation (the Focus of Expansion, FOE, or focus of 
contraction, FOC, depending on whether the observer is approaching or moving 
away from the scene), scaled depth and the three rotational parameters can be 
obtained. The direction of the axis of rotation will be denoted as AOR.. The 
normal how Vn amounts to the projection of the how f on the local gradient 
direction. If n is a unit vector denoting the orientation of the gradient, the value 
of the normal how, amounts to 



(z X (t X r)) + ^Z X (r X (w X r)) = + Urot(w) (1) 



The hrst constraint, the “pattern constraint,” uses as its only physical constraint 
the fact that the scene lies in front of the camera and thus all the depth values 
have to be positive. This allows the relation of the sign of normal how measure- 
ments to the directions of the translation and the rotation, that is, the FOE and 
AOR.. The basis lies in selecting groups of measurements of normal how along 
pre-dehned directions (orientation helds) which form patterns in the image plane, 
and these patterns encode the 3D motion [4], [5], [6], [7]. 

There are two classes of orientation helds. The hrst class are called copoint 
helds. Each copoint held parameterized by s, is dehned as the unit vectors per- 
pendicular to a translation how held with translation s. These unit vectors are 
in the direction Vcp(s,r) = z x Utr(s) = z x (z x (s x r)) and they are perpen- 
dicular to the bundle of lines passing through sq (the point where s intersects 
the image plane) (Fig. 2a). 

Now consider a normal how held due to rigid motion (t,a;) and consider 
all the normal how vectors of this held which are in direction Vcp(s,r), that is. 
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Fig. 2. (a) Copoint field filled with qualitative motion measurements, (b) copoint vec- 
tors and copoint pattern, (c) coaxis vectors and patterns 



the vectors of value • i"- The translational components of these vectors are 
separated by the conic (w x r) • (s x r) = 0 into an area with negative values 
and an area with positive values. Similarly, the rotational components of the 
same vectors are separated by the line (t x s) • r = 0 into areas of positive and 
negative values. The rigid motion field is the sum of the translational and the 
rotational flow field; thus the areas above need to be superimposed. Where both 
the translational and rotational components are positive the combined flow is 
positive and where both the translational and rotational components are negative 
the combined flow is negative. In the remaining areas the sign is undetermined. 
The areas of different sign form a pattern in the image plane as shown in Fig. 2b. 
For every direction Vcp parameterized by s there is a different pattern and the 
intersection of the different patterns provides the 3D motion. 

The second class of patterns is defined by the orientation field of coaxis 
vectors, which are the unit vectors perpendicular to a certain rotational flow 
field, that is, the coaxis vectors parameterized by s are in direction Vca — z x 
(z X (r X (w X r))). In this case the translational components are separated 
by a line and the rotational components are separated by a cone into positive 
and negative values and give patterns such as that shown in Fig. 2c. Using these 
pattern constraints the problem of self-motion estimation amounts to recognizing 
particular patterns in the image plane. The system would compute the normal 
flow and then for a number of orientation fields find the patterns which are 
consistent with the sign of the estimated normal flow vectors. The intersection 
of the possible patterns provides the solution, usually in the form of possible 
areas for the FOE and AOR., as most often there is not enough data to provide 
an exact localization of the 3D motion. 

The question now is, is it possible at all to have a technique that accurately 
computes the 3D rigid motion parameters? The answer is no, if the data avail- 
able is from a limited field of view. This statement is based on an error analysis 
conducted in [8] , [9] . Estimating 3D motion amounts to minimizing some func- 
tion. If the pattern constraints are used, this function is based on negative depth 
values and if the classical epipolar constraint requiring optical flow is used, this 
function is based on the distances of the flow vectors from the epipolar lines. The 
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correct 3D motion is defined as the minimum in the functions. A topographic 
analysis of the expected values showed that for all these functions the minimum 
lies within a valley, and locating the exact position of the minimum within this 
valley under noisy conditions generally is not possible. 

If the 3D motion is known, the depth of the scene can be computed from the 
flow held. If t and d> are the estimates of the direction of translation and the 
rotation the estimated depth Z is derived from (1) as 

^ _ Utr(t) • n 

f • n - Urot(w) • n 



If the estimates t and correspond to the correct values of the actual rigid 
motion t and w, then Z will give a correct scene depth estimate, but what if 
there are errors in the estimates? Then the estimated depth will be a distorted 
version of the actual depth [2], [3], [8], [11]. 

Substituting for f in (2) the actual motion parameters the estimated depth 
can be written as 

utr(t) • n 

Utr(t) • n - ZUrot(l^<^) • n ’ 

with denoting the estimation error in rotation — w, oi to make clear 

the distortion as Z — Z ■ D, where D — (t).ir-Zu “(^u>)-n multiplicative 

distortion factor to express how wrong depth estimates result from inaccurate 
3D motion values. Note that D also depends on n. For different directions, n, 
the distortion factor takes on different values, ranging from — oo to +oo. 

This concept of space distortion forms the basis of the second constraint 
relating normal flow to 3D geometry, the “depth variability constraint.” The 
idea is, instead of formulating smoothness constraints on the 2D optical flow 
held as is usually done, to relate the smoothness of scene patches and the 3D 
motion directly to normal how. To be more specihc, if we estimate the depth 
from the how held corresponding to a smooth scene patch using the correct 3D 
rigid motion, we will compute a smooth scene patch. If we use an erroneous 
3D motion estimate, then the obtained depth function, assuming that normal 
how vectors in different directions are available, will be rugged. Not only do 
incorrect estimates of motion parameters lead to incorrect depth estimates, but 
the distortion is such that the worse the motion estimate, the more likely depth 
estimates are obtained that locally vary more than the correct ones. Thus one 
has to dehne functions which express the variability of depth values and search 
for the 3D motion which minimizes these functions. Locally for the how values 
fj- in direction n*- within a region R we dehne depth variability measures as 

6>o (t, tb, a) = ^ Wi • rii - Urot(u) J ('itr (t) • n) 

where \ j Z \s the depth estimate minimizing do , which is obtained by modeling 
the depth within the region using a parametric model, and Wi are weights that 
can be chosen differently. Solving for the minimum \ jZ and substituting back 
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we obtain 9i. The sum of all 9i in all regions, 02, is the global function which 
we are interested in minimizing. 

To algorithmically utilize these constraints one has to proceed in iterations as 
follows. Split the image into patches and perform a search in the space of trans- 
lational directions. For every candidate translation estimate the best rotation 
using the depth variability measure, then perform a depth segmentation using 
the image data and the candidate motion parameter, and hnally evaluate the 
depth variability measure taking into account the segmentation. The solution is 
found as the 3D motion minimizing the global depth variability measure. 

In summary of the above discussion a number of conclusions can be drawn 
regarding the possible computations of space-time geometry. 

— It is not possible to reliably estimate the 3D rigid motion very accurately if 
the held of view is limited. This is true even for the simplihed case of one 
camera moving in a static environment. If there is less data available — as it 
is when one considers motion of objects covering a small held of view — the 
estimation is much complicated and all we can expect are bounds on the 
motion parameters. 

— It is also not possible for a real-time system to derive accurate depth es- 
timates. Considering the cue of motion, this follows from the fact that 3D 
motion cannot be estimated accurately. As a result of this one obtains a 
distorted depth map, but it also has been shown experimentally for other 
cues that human space perception is not exactly Euclidean [16]. 

— Since reconstruction is not possible the information describing the models 
must be encoded in the form of patterns in the how held. 

— The computations carrying out the space-time geometry interpretation must 
be implemented as feedback processes. This is true even for the simple, basic 
actions as will be elaborated on in the next section, and for more complicated 
actions the interplay between bottom-up and top-down processes becomes 
extremely important. 

6 Feedback 

The interpretation of motion helds is a dilhcult computational problem. From 
the raw images only, it is not possible to compute a very accurate 2D motion 
held, and from erroneous motion held measurements it is not possible to perform 
good 3D motion estimation, and, in the sequel, scene estimation. To estimate 
good how we need information about the space-time geometry, and to estimate 
the geometry well we need at least the how held to be segmented accurately. 

This seemingly chicken and egg situation calls for a simultaneous estimation 
of image motion and 3D motion and scene geometry, and this could only be 
implemented through an iterative or feedback process. The whole interpretation 
process should take the following form: First, the system estimates approximate 
image velocity by combining normal how measurements. The representation of 
these estimates could be in the form of qualitative descriptions of local how held 
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patches or bounds on the flow values, but it should allow for hnding the more eas- 
ily detected discontinuities in the flow fleld. Using the flow computed in this way 
or maybe even normal flow measurements within segmented areas an estimate of 
3D motion is derived and a partial 3D shape model of the scene computed. Sub- 
sequently, the computed information about the space-time geometry is fed back 
to use image measurements from larger regions to perform better flow estima- 
tion, more accurate discontinuity localization, and in the sequel the estimates 
of 3D motion and structure are improved. Naturally the whole interpretation 
process has to be dynamic. While the system computes structure and motion 
from one flow fleld, the geometry changes and new images are taken. Thus the 
system also has to be able to make predictions and relate the 3D information 
computed earlier in time to images acquired later. 

There are a number of ways these feedback processes could be implemented in 
a system. From a computational perspective we can ask for an optimal way, and 
this will depend on the computational power of the system and the accuracy of 
the estimates needed. There are many eiforts in the biological sciences, in flelds 
such as psychophysics, neurophysiology and anatomy, to flgure out the structure 
and processes which exist in human and other primate brains — this structure is 
generally referred to as the motion pathway; but many of these studies do not 
consider that the sole purpose of motion processing is to interpret the 4D space- 
time geometry. Wedl next elaborate on two visual illusions that in our opinion 
demonstrate that the human visual system must perform feedback processes to 
interpret image motion. 



6.1 The Ouchi Illusion 

The striking Ouchi illusion shown in Fig. 3 consists of two black and white rect- 
angular checkerboard patterns oriented in orthogonal directions — a background 
orientation surrounding an inner ring. Small retinal motions, or slight move- 
ments of the paper, evince a segmentation of the inset pattern, and motion of 
the inset relative to the surround. Our explanation lies in the estimation of dif- 
ferently biased flow vector flelds in the two patterns which in turn give rise to 
two diiferent 3D motion estimates. When the system feeds back these motion 
estimates it performs a segmentation of the image using the flow flelds and the 
static image texture, and because of the diiference in 3D motion one pattern 
appears to move relative to the other. 

The bias is easily understood by considering the most simple model of image 
motion estimation. Local image measurements deflne at a point one component 
of the image flow, for example, if the image measurements are the spatial deriva- 
tives E^, Ey and the temporal derivative Et of the image intensity function, E, 
at point r*- the flow i — (u, u, 0) is constrained by the equation 

-h Ey^vEt^ — Et^ (3) 



We consider additive zero mean independent noise in the derivatives of E. Let n 
be the number of indices i to which (3) applies, which for convenience we write 
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ill matrix form as 

EsU = Et (4) 

where E^ is the n by 2 matrix incorporating the spatial derivatives £ 3 ;^, Ey^, E* 
is an n by 1 matrix incorporating the Et^, and u is the vector (u, v). 

If (4) is solved by standard least squares estimation the solution is 

u= (e/E,)“'e*Tu 

and this solution, if there are errors in the spatial derivatives, is biased. The 
expected value, £'(u), of u amounts to 

E(\i) = - na'j (^E/"''e/^ 

where primes are used here to denote actual values and (t^. is the variance of the 
noise in the spatial derivatives. This expected value of u is an underestimate 
in length and in a direction closer to the majority of gradient directions in the 
patch. 

It has been shown in [13], [14] that it is not only gradient based techniques 
and least squares estimation that produce bias, but other techniques (frequency- 
domain, correlation techniques, and different estimation procedures) as well. 
Correcting for the bias would require knowledge of the noise parameters and 
these are difficult to obtain. In the case of the Ouchi illusion the 3D motion, which 
is caused by small eye movements or rapid motion of the paper, changes very 
quickly so the system does not have enough data available to acquire the statistics 
of the noise. In more natural situations, however, the system can reduce the 
bias problem if it performs feedback processes. This way, after some estimates of 
motion and structure have been obtained, image measurements from larger image 
regions can be considered to acquire more accurate statistical noise sampling. 

6.2 Enigma and Variants 

The static hgure. Enigma, painted by Leviant and shown in Fig. 4, consists of 
radial lines emanating from the center of the image and interrupted by a set of 
concentric, uniformly colored rings. Upon extended viewing of Enigma, most hu- 
mans perceive illusory movement inside the rings which keeps changing direction. 
Based on a study of positron emission tomography, researchers in neurophysiol- 
ogy [20], [21] have argued that higher level processes of motion interpretation in 
the human brain are responsible for the perception of this illusion. To be more 
specihc, these researchers compared the cerebral blood flow during the viewing 
of Enigma and a similar reference image which does not give rise to an illusion, 
and they found the values comparable in the early motion processing area VI, 
but found signiflcantly higher values for Enigma in V5 (or MT) — a later area in 
visual processing. 

The spatial texture in Enigma is closely related to the patterns described in 
Sect. 5 that form constraints for 3D motion estimation. This lets us hypothe- 
size that the illusion essentially is due to higher level activity. It is due to the 
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Fig. 3. A pattern similar to one by Onchi [18] 




Fig. 4. Enigma, giving rise to the Leviant iUnsion: Fixation at the center results in 
perception of a rotary motion inside the rings 
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particular architecture of the visual system which solves the tasks of 3D motion 
estimation and segmentation through feedback processes [12], The explanation 
in more detail is as follows. Small eye movements give rise to retinal motion sig- 
nals. Because of Enigma’s particular structure motion signals occur only in the 
areas of radial lines; within the rings there is no texture and thus also no early 
motion signals can be picked up there. The motion signals, that is the normal 
flow vectors perpendicular to the radial lines, all belong to exactly one copoint 
held, the one corresponding to s being the axis passing through the center of the 
image. During the next processing stages, normal flow vectors from increasingly 
larger areas are combined to estimate the rigid motion and during these stages 
some spatial and temporal integration of smoothing takes place which causes 
motion signals also within the rings. After an estimate of 3D motion is found, 
this information is fed back to the earlier processing elements which then have to 
perform exact localization of independently moving objects. At this stage, tem- 
poral integration cancels out the flow information within the areas covered by 
the rays. It, however, does not cancel out the motion within the homogeneous re- 
gions since the responses from the processing elements there do not contradict an 
existing motion in these areas. Since, at this stage, the system’s task is to accu- 
rately segment the scene, the edges of the homogeneous regions are perceived as 
motion discontinuities and motion within the homogeneous regions is reinforced. 
The deflning feature of Enigma is that all the spatial gradients in its patterns 
excite only one class of vector flelds involved in 3D motion estimation — a copoint 
held. To test our hypothesis we created flgures based on the same principle as 
Enigma — that is with black and white rays giving rise to normal flow vectors 
corresponding to exactly one coaxis or copoint vector fleld and homogeneous ar- 
eas perpendicular to these rays. For an example, see Fig. 5. We found that it is 
exactly these patterns giving rise to illusory motion. Other flgures based on the 
same principle (rays interrupted by perpendicular homogeneous areas) which do 
not correspond to the vector flelds involved in 3D motion estimation have been 
found not to give rise to an illusion, thus supporting the hypothesis. 



7 Complex Actions 

One may be able to derive a simple procedure for recognizing some action based 
on the statistics of local image movement. There exists a variety of such tech- 
niques in the current literature mostly applied to human movement or facial 
expressions, but according to our account of what it means to understand an ac- 
tion, representations that enable this sort of recognition are simply inadequate. 
What we need are representations that can visualize the action under consider- 
ation. Let us summarize a number of key points that are important in flguring 
out the nature of action representations: 

1. Action representations should be view independent. We are able to recognize 
and visualize actions regardless of viewpoint. 
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Fig. 5. Example of an illnsory movement figure based on a coaxis field. Here the focal 
length is chosen to be about equal to the size of the image. Thus, the iUnsion is best 
experienced at a short distance 



2. Action representations capture dynamic information which is manifested in 
a long image sequence. Put simply, it is not possible to understand an action 
on the basis of a small sequence of frames (viewpoints). 

3. It is the combination of shape and movement that makes up an action rep- 
resentation. 

A number of issues become clear from the impositions of the constraints listed 
above. First, to understand human action, a model of the human body should be 
available. Second, the movement of the different parts of the body relative to each 
other and to the environment should be recovered in a way that allows matching. 
Third, since action amounts to the movement of 3D space, action representations 
should contain information about both components. Let us further examine those 
issues in order. Acquiring a sophisticated model of the human body, although of 
extreme practical importance, is not essential to our discussion. We would like to 
caution, however, against learning approaches. It is probably an impossible task 
to learn such a model from visual data in a reasonable time. It is quite plausible 
that humans are born with such a model available to them. After all, it is a model 
of themselves. Later, we will describe technological solutions to this question. 
The second issue deserves more attention, because it introduces constraints. The 
previous section discussed the dilhculties in acquiring 3D motion and shape, 
when normal motion helds were available in the whole visual held. From this 
we conclude that it is impossible to acquire exact 3D motion information from 
a very small part of the visual held, that is, it is not feasible to accurately 
estimate the 3D motion of the different rigid parts of the body from image 
sequences of it. Whatever the action representation is, it appears very hard to 
achieve recognition in a bottom-up fashion. The concept of feedback explained 
in the previous section acquires here a major importance. Recognizing action 
should be mostly a top-down process. Finally, the third issue rehects the need to 
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incorporate both shape and motion in the representation. This means that any 
motion representation should be spatially organized. There are, however, various 
options here. Do we explicitly represent movement in terms of motion helds, or 
do we do it implicitly in terms of forces, dynamics, and inertial constraints? Both 
options have their pros and cons. 

8 How Could Action Representations Be? 

To gain insights on action representations we consider them in a hierarchy with 
two more kinds of representations and the properties of the possible mappings 
among them. First there is the image data, that is, videos of humans in action. 
Considering the cue of motion, then our image data amounts to a sequence of 
normal flow flelds computed from the videos. The second kind of representa- 
tions are intermediate descriptions encoding information about 3D space and 
3D motion, estimated from the input (video). These representations consist of 
a whole range of descriptions of dilferent sophistication encoding partially the 
space-time geometry, and they are scene dependent. Finally, we have the action 
representations themselves, which are scene independent. 

Before we start considering the relationships in this hierarchy, let us intro- 
duce a novel and very useful tool in the study of action. It is a tool because it 
can help us in this study, but at the same time it amounts to a theoretical inter- 
mediate representation that combines the advantages of view independence with 
the capture of the action as a whole. It is a theoretical tool in the same way that 
the image motion held is a theoretical construct. An intermediate representation 
for the speciflc action in view is then a sequence of evolving 3D motion flelds 
and it is the most sophisticated intermediate description that could be obtained. 
Acquiring this representation is no simple matter, but it can be achieved by 
employing a very large number of viewpoints. We have established in our lab- 
oratory a multi-camera network (sixty-four cameras, Kodak ES-310, providing 
images at a rate of eighty-live frames per second; the video is collected directly 
on disk — the cameras are connected by a high-speed network possessing sixteen 
dual processor Pentium 450s). By observing human action from all viewpoints 
we are able to create a sequence of 3D motion flelds [10]. See Fig. 6 that dis- 
plays a schematic description of our facility. The solution requires a calibration 
of the camera network [1] and proceeds by carving the rays in space using image 
intensity and image motion values. 

Let us now consider the mapping from the second level of the hierarchy 
(intermediate descriptions and, in particular, 3D motion flelds) to the scene 
independent action representations (top level). This mapping should be such 
that it extracts from a speciflc action quantities of a generic character common 
to all actions of the same type. These quantities most probably take the form of 
spatiotemporal patterns in four dimensions. 

One way of obtaining such patterns is to perform statistics on a large enough 
sample. Considering, for example, a particular action (e.g., walking), we can 
obtain data in the multi-camera laboratory described before for a large number 




Analyzing Action Representations 19 




Fig. 6. A negative spherical eye 



of individuals. In each case we can obtain a 3D motion field and thus have at our 
disposal a large number of 3D motion fields. A number of statistical techniques, 
such as principal component analysis, can reduce the dimensionality of this space 
and describe it with a small number of parameters, which parameters will in turn 
constitute a representation for the generic action under consideration. 

Another way of obtaining these patterns would be to employ a geometric 
approach as well. Since a human action consists of a large number of differ- 
ent rigid motions, one possibility is to obtain these rigid motions from the 3D 
motion field, encode their relationships, and study invariances related to sym- 
metry, and geometric quantities in space-time (angles, velocities, accelerations, 
periodicity, etc.). 

Considering now the mappings from the top level to the image data and 
to the intermediate representations we can gain further insight. For the sake 
of recognition, action descriptions will have to be projected on the image and 
matched against the intermediate descriptions and the image data. But, as ex- 
plained in previous sections, these intermediate descriptions are errorful and 
constitute distorted versions of the scene. In addition, there is variability among 
actions of the same type. In trying to perform this matching in space we will 
have to revise both the action representations and the nature of the intermedi- 
ate descriptions so that the matching can be facilitated. In actual fact we may 
have to develop specific spatiotemporal patterns in the image data for this goal 
and these patterns will constitute some form of action representations for quick 
recognition. To conclude, the interplay among the mappings between the intro- 
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duced hierarchy levels seems to be a feasible and structured way for building 
powerful action descriptions. This constitutes our current research elforts. 

9 Conclusions 

Being truthful to Hartmann’s and Lorenz’s approach regarding evolution and 
the structure of intelligent systems, we wish to study intelligent systems as they 
actually are, using the present structures in their bodies as our data. Consid- 
ering representations for a number of important complex actions, we have two 
options about their origin: either the system learned them or it was born with 
them. We hud the argument about the innate character of (at least some) action 
representations very appealing, because humans have retained the same basic 
structure and have been performing, for the most part, the same repertoire of 
actions for a very long time. It makes good sense to build these representations 
using all available knowledge and technology. This paper offered a number of 
ways for accomplishing this goal. 
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Abstract. The sense of touch and the capability to analyze potential 
contacts is important to many interactions of robots, such as planning 
exploration, handling objects, or avoiding collisions based on sensing 
of the environment. It is a pleasant surprise that the mathematics of 
touching and contact can be developed along the same algebraic lines 
as that of linear systems theory. In this paper we exhibit the relevant 
spectral transform, delta functions and sampling theorems. We do this 
mainly for a piecewise representation of the geometrical object boundary 
by Monge patches, i.e. in a representation by (umbral) functions. For this 
representation, the analogy with the linear systems theory is obvious, 
and a source of inspiration for the treatment of geometric contact using 
a spectrum of directions. 



1 Kissing Contact 

1.1 Intuition and Simplification 

When two objects touch (Fig. 1), they are locally tangent. More precisely, at the 
point of contact they have tangent planes with a common attitude and location, 
but opposite orientation. Such a contact is called ‘kissing’. 

Kissing is a local property. It is often made impossible when the objects would 
penetrate each other at other locations along their boundary. This makes the 
operation hard to analyze (for instance, it becomes non-differentiable), and one 
often does not get further than a lattice-theoretic (and hence rather qualitative) 
analysis [8]. To go beyond that, we must regularize the operation, and boldly 
decide to ignore those overlaps elsewhere - we analyze the mathematics of local 
kissing first, and consider the exclusion of situations in which penetration occurs 
as a worry for later (not treated in this paper) . 

Two objects are involved in the kissing, A and B, and obviously only their 
boundaries are of concern to the operation. To characterize the result of the 
kissing of the boundaries, it is customary to consider that result as a boundary 
itself. We denote it by M ® B. This boundary exists in the configuration space 
of the objects A and B, but to simplify the analysis we will bring it down to the 
‘task space’ of locations in which A and B are defined as objects. This is done 
as follows. 

Both A and B are rigid bodies, characterized as a set of points in their own 
frame of reference, i.e. as a set of (translation) vectors in the vector space U™ in 

G. Sommer and Y. Y. Zeevi (Eds.): AFPAC 2000, LNCS 1888, pp. 22—47, 2000. 

(c) Springer- Verlag Berlin Heidelberg 2000 
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double kiss 



kiss with penetration 



Fig. 1. Two objects in contact in 2-dimensional space. The rightmost example 
shows how penetration is ‘beyond the double kiss’, changing one of the kisses into 
penetration; but it makes the curve of kissing positions differentiable (dashed) 



which the objects reside. These sets need not be connected: A could represent 
all the obstacles in a scene, B could be the moving robot. Consider the kissing 
from the point of view of A, i.e. keep A stationary and move B around her. The 
various configurations of the rigid body B can be denoted by a translation T 
and a rotation R, as T RB. The space spanned by the characterizing parameters 
of T and R is called configuration space; it is 3-dimensional if A and B are 2- 
dimensional objects in V^, and 6-dimensional for 3-dimensional A and B in V^. 
The configurations at which kissing occurs determine some curved hypersurface 
in this space. This surface is actually a boundary since it has an obvious inside 
and outside: we can (locally) freely move into a kissing configuration, but are 
not allowed locally to penetrate the object beyond the kissing contact point. 
Obviously, the rotations make for complicated boundaries in the configuration 
space. It is therefore customary to focus on translational motions, analyzing 
those for a fixed rotation. We do the same. 

Since the configuration space of translations of objects is obviously of the 
same dimensionality as the sets of translation vectors defining the object points, 
the boundaries of A, B and A0 B may all be considered to reside in the same 
‘task space’ of the robot. We could refer to ^0 S as a ‘kissing boundary’, and 
we will treat it as the boundary of an object of the same reality as A and B. 



1.2 Geometrical Objects and Boundary Representation 

For robotics, the boundaries of interest are those of actual geometrical objects as 
they occur in the ‘real world’. These can be characterized in many different ways: 
approximately as polyhedra, exactly as parametrized oriented hypersurfaces. In 
all cases, there should be a way of retrieving the set of oriented tangent planes at 
each point of the boundary. Most representations do this implicitly, by specify- 
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Fig. 2. The relative orientation of I and n in 2-dimensional Euclidean space 
(left) and 3-dimensional Euclidean space (right) with right-handed orientation 
convention 



ing the boundary facets or points and providing the possibility of differencing or 
differentiating them to obtain the tangent information. It will be convenient for 
us to view a point of a boundary as a set of (position, tangent space)-pairs. This 
is a local representation of the surface, independent of whether other points are 
known. It is realistic in robotics, where the complete boundary information may 
not be available due to partial sensing. We denote position by p, and characterize 
the tangent hyperplane at p by I[p]. We will represent it computationally using 
geometric algebra [9,10] as an {m — l)-blade, but you can read I[p] as a symbolic 
notation if you are unfamiliar with that framework. We will also characterize 
it, dually, by the more classical inward pointing normal vector n[p] (although 
this involves introducing a metric to specify perpendicularity, whereas kissing is 
actually an affine, and even conformal, property.) We then associate orientation 
of the tangent plane and the direction of the inward pointing normal in a consis- 
tent manner. In 2-dimensional space V^, we use the convention that the inward 
pointing normal is achieved by an counterclockwise turn of the tangent vector 
I, if the spatial area element has counterclockwise orientation. In 3-dimensional 
space, we will relate the orientation of n and the oriented tangent plane I by a 
left-hand screw relationship if the space has a right-handed volume element I 3 
(see Fig. 2). (In m-dimensional space with pseudoscalar (hypervolume element) 
Im, we may generalize this rule to n[p] = — I[p]/Im-) 

To keep things simple, we will take smooth surfaces with a unique tangent 
plane at every point; tangent cones would require too much administration, 
obscuring the essence of this paper. 

If we have a tangent plane at a location p (let’s refer to that as an ‘off-set 
tangent plane’), it is convenient to encode this using homogeneous coordinates, 
or more generally using the homogeneous model of Euclidean geometry provided 
by geometric algebra [9] . This is an embedding of Euclidean m-space in a vector 
space of 1 dimension higher. We denote the extra dimensional direction by eg, 
with reciprocal e°, so that eo-e° = 1. Both eg and its reciprocal are orthogonal 
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to any vector of the space with pseudoscalar !„, so ep-Im = 0 and -Im = 0. 
The homogeneous model embeds the point at location p as the vector 

eo + P 

in (m + l)-dimensional space. (This is thus very similar to homogeneous coor- 
dinates, where p = (pi,p 2 )^ is embedded as {l,pi,p 2 )^ .) The offset tangent 
plane at location p with tangent I[p] is represented in the homogeneous model 
by the m-blade: 

(eo-hp)Al[p] (1) 

(If you are not familiar with geometric algebra, you may view this computational 
expression as a mathematical shorthand; you should still be able to follow the 
reasoning that follows, although you will miss out on its computational preci- 
sion.) It is more convenient to consider the position as a function of the tangent 
rather than vice versa, so that we will treat eq.(l) as: 

7^[I] = (eo + p[I])AI. (2) 

Note that this re-indexing from I[p] to p[I] requires an ‘inversion of the deriva- 
tive’ which is conceptually straightforward. Yet to achieve it, for instance when 
the object boundary has been given analytically, requires a functional inversion 
which may not have a closed- form solution. We will not worry about these al- 
gorithmic issues for now. The inversion also leads to multi-valuedness, since the 
same tangent hyperplane attitude I may occur at several positions (if the object 
is not convex). We take all computations to be implicitly overloaded on these 
multi-valued outcomes, rather than messing up our equations with some index 
or set notation. 

In our dual representation of the tangent plane by a normal vector, we need 
to take the dual of eq.(2), and characterize both p and the representation as a 
function of n rather than I. We denote them by p[n] and 7?.[n]. We obtain by 
dualization: 

7?.[n] = (eg -f p[n]) -(e^n) = n — e°(p[n] -n) = n -|- e°CT[n] (3) 

where we defined the support function 

cr[n] = -p[n]-n, 

which gives the signed distance of the tangent plane to the origin, positive when 
the origin is (locally viewed) on the ‘inside’. The support function for non-convex 
objects is multi-valued, since there may be more than one location where the 
inward pointing normal equals n. 

The representation in eq.(3) is now, geometrically, the support function plot- 
ted in the e°-direction, as a function of n. Since n denotes the directions of 
normal vectors, it parametrizes the Gaussian sphere of directions, the range of 
the Gauss map. The distance function is a function on the Gauss sphere. But 
the geometrization of this function using geometric algebra offers advantages in 
computation, since duality becomes simply taking the orthogonal complement 
(through division by eglm)- 
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Example: In the case of a solid sphere of radius p centered around the 
origin, the representation has support function p at each n; therefore the 
representation is 72,[n] = n + e°p. Note that this also applies when p is 
negative; this gives a spherical hole. A point at the origin (which is a 
sphere of radius 0) has as representation TZ[r\\ = n. A point at location 
q has as representation TZ[r\\ = n — e°(q-n). 

The object representations based on the multi-valued support function is invert- 
ible. Intuitively, you can see this: from a specification of the tangent planes of 
all points around p, one should be able to reconstruct p as the intersection of 
these differentially different planes. You should think of this reconstruction of 
a surface from its tangent planes as the computation of a caustic. In geometric 
algebra, this is the meet of the neighboring tangent planes, and it can be dually 
computed as the join. To determine it, we need the derivative of TZ (or if you 
prefer, the derivative of the support function on the Gauss map); the dual of 
this derivative tangent plane to the representation is proportional to p. This is 
described in [4] using differential geometry in the 2-dimensional case, and in [2] 
using geometric algebra for the m-dimensional case. Computable formulas re- 
sult, but repetition of those here would be a bit involved and not lead to much 
insight. (We’ll see a simpler inversion formula for the two dimensional case in 
section 2.2). 

1.3 Kissing Objects 

When two objects A and B kiss, they have (locally) opposite tangents, I and —I 
at the contact point. This contact generates a point on the resulting boundary 
which is the addition of the two position vectors (see Fig. 3a). In the system 
of A, this is p^[I] — Pg[— I]. It makes the mathematics more convenient (fewer 
signs to keep track of) if we see this as a basic operation between A, and an 
object B which is the point-mirrored version of B, defined by: 

Pb[I] = -Pb)-!]. 

Then the result of the kissing of A onto B is identical to the operation of Huy- 
gens wave propagation using A as primary wave front, and B as propagator (see 
Fig. 3b). We denote this operation of wave propagation by ©• In [3] it was called 
tangential dilation, since it is the ‘dilation’ operation of mathematical morphol- 
ogy specialized to the boundaries of objects. The dilation is the Minkowski sum 
of the sets of points representing the objects, and is therefore operation on their 
volume; but this may be reconstructed from the much more tractable tangential 
dilation of their boundaries. 

So the kissing of A by an object B is the tangential dilation of A with the 
point-mirrored object B (and vice versa): 

A®B = A 0 F. 

^From now on, we treat the tangential dilation only, but this identity makes all 
results transferable to kissing. 
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Fig. 3. (a) Kissing of boundaries, P_4,giB(I) = P.a(I) ~ Pe(~I)- (t>) Propagation 
or tangential dilation of boundaries, corresponding to (a) and generating the 
same boundary point; P^®e(I) = P.4 (I) + Pb(I) 



The resulting point (s) of A®B as a consequence of the tangential dilation 
of point p^ with tangent I on .4 and a point pg with tangent I on S is then 
the point p_4 + pg. The tangent of the result is also I (as may be seen by local, 
first-order variation of p^ and pg.) For the representation this addition has as 
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its consequence: 

7^^5e[I]=eoI+(p^[I]+PB[I])AI (4) 

and for the dual representation 

=n-e°(p^[n] + pB[n])-n 
= n-e°(CT^[n]+CTB[n]). 

So, under tangential dilation, the support functions add up: 

[n] = CTA [n] + ere [n] . ( 5 ) 

(Since support functions are multi-valued, this may involve multiple additions, 
and a more proper notation would be to indicate this overload of the addition 
as a Minkowski sum of the set of values, but this is again administrative rather 
than insightful.) 

2 Objects as Umbral Functions 

It is interesting to see what happens locally. To study this, we denote the object 
by a Monge patch, i.e. we choose some local hyperplane relative to which the 
object surface may be described by a function. Such a function should still be 
endowed with a notion of ‘inside’; shading the points ‘under’ the function then 
leads to the name umbral function (i.e. a function with a shadow) [7]. 

2.1 Patch Representation; Legendre Transform 

The Monge patch description therefore involves choosing a function direction, 
denoted by e, and introducing an (m— l)-dimensional vectorial coordinate x to 
describe the position in a plane perpendicular to this direction (so x-e = 0). The 
consequences for the representation then follow immediately from this. 

In our treatment of objects by Monge patches, we are mostly interested 
in the case m = 2 , since this paper will demonstrate the analogy with the 
linear filtering of 1-dimensional scalar functions. So we may introduce 62 as the 
special ‘function’ direction e of the patch description, and ei as the orthogonal 
coordinate direction for a coordinate x. Thus a point p on the object boundary 
can be written as a function p : IR defined by p(x) = xei + f{x)e2 with 

/ : IR ^ IR locally encoding the boundary. 

The tangent plane now has a direction which is obtained by differentiation 
of p(a:) to X, and proper orienting. It is denoted by the tangent vector I[x] = 
— ei — f'{x)e2, or dually by the normal vector n[x] = — I[x]/l 2 = f'{x)ei — 02 . 
Then we compute the dual representation of this patch as: 

7 ?.[n[a;]] = n(x) — e° p[a:] •n[x] 

= f'{x)ei - 02 - e°(a:ei -k f{x)e2) ■ {f'{x)ei - 02) 

= f{x)ei -62 + e°(/(x) - xf{x)). 



( 6 ) 
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This is a spatial curve which may be parametrized by its ei coordinate. Let us 
denote that coordinate by to: 

w = f'{x). 

Then we need the value of x where the slope equals tu to rewrite the curve in 
terms of ui. If /' is invertible, this is x = /'“^(w) and the curve becomes 

7^[nN] = wei - e2 + e° (/(/'"^(w)) - /'"^(w) w) ■ 

But this is a clumsy notation for computations. We can improve the formulation 
by introducing a notation for the ‘stationary value of a function’ 

stat„[5(u)] = {g{u») \ g'{u») = 0}. 

This is multi-valued, since there may be several extrema, of different magni- 
tudes. (We will also encounter functions with multi-valued derivatives and in 
that case we should read = 0 in the definition above as ‘the derivative 

of g at It* contains O’, so g'{u^) 9 0.) With this ‘stat’ operation, we can encode 
the definition of w nicely in the stat expression specifying the e°-component, for 
we observe that 

statu[f{u) — ULo] = {/(m*) — u^^uj I f{u^) — w = 0} 

is precisely the value we need. Therefore we rewrite 

'Rf[uj\ = wei - 62 - 1 - 6 ° statu [/(u) — ulo] 

= W6i - 62 -f £[/](w)6°, 

where we defined the (extended) Legendre transform} of / by: 

C[f]{ui) = statu[f{u) -uui] = {/(m*) - I /'(u*) = ui}. (7) 

The dual representation of eq.(6) shows that in the plane with 62-coordinate 
equal to —1, parallel to the (61 A6°)-plane, it is the (extended) Legendre trans- 
form as a function of to. 

For Monge patches and umbral functions, the Legendre transform C[f] thus 
plays the role of the support function cr in the purely geometrical framework. 
It denotes the support (a measure of the distance to the origin) of the tangent 
hyperplane with slope lo, through its intercept with the 62-axis, see Fig. 4a,b. 

Since the support function representation of objects is invertible, so is the 
intercept characterization of these supports by the Legendre transform. And 
indeed the (extended) Legendre transform is invertible, through 

C~'^[F]{x) = siati,[F{v) + xv], (8) 

^ We call it ‘extended’ since the Legendre transform does not usually admit multi- 
valuedness, being applied to purely convex or concave functions only; but the basic 
transform principle is so similar that it hardly justifies a new name. We originally 
introduced it under the name ‘slope transform’ [3] [8], but now prefer honoring Leg- 
endre. 
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Fig. 4. (a) The Legendre transform for a convex function, (b) The multi-valued 
extended Legendre transform for a non-convex function 



in the sense that 

£-'[£[/]] = /. 

This is easily shown (if you are willing to take some properties of the multivalued 
‘stat’ operator for granted): 

C-^[C[f]]{x) = 

= stati, [statu [/(m) — i'u] + xty] 

= stati,[{/(u*) -I- iy{x - u») \ f'{u*) - v = Q}] 

= {f{u*) + v^{x — u*) I f'{u^) — z/* = 0 and a: — u* = 0} 

= {f{x) + v^{x — Ut) I ly* = f{ut) and x = u,} 

= fix) 

(a singleton set results, so we drop the set notation).^ 

^ To do all this properly using multi-valued functions is a bit of a pain, and requires 
making sure that one keeps track of the various branches of the Legendre transform 
caused by different convex/concave portions of the original functions. This takes 
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2.2 Tangential Dilation of Umbral Functions 

In the object representations by support functions, we saw that the tangential 
dilation is additive. This additivity of the support functions under tangential 
dilation refers to a translational shift of the tangents. It transfers directly to 
additivity of the intercepts of umbral functions, i.e. to Legendre transforms. 
Since the Legendre transform is invertible, we can use this to define what we 
mean by tangential dilation of umbral functions: 

^[/®5]M = C[f]{u;) + C[f]{uj). (9) 

Inversion of this gives the formula for /©g as an operation on umbral functions: 

if®g){x) = C-^[C[f] + C[g]]{uj) 

= stati. [£[/](!/) +C[g]{iy) + iyx] 

= stati, statu stat„[/(u) + g{v) + {x—u — v) v] ] 

= statu statu staty [/(u) + g(v) + (x—u — v) v] ] 

= statu statu[{/(u) + g(v) + (x—u — v) \ x—u — v = 0}] 

= statu[/(u) + 5 (a:-u)]. 

Therefore we obtain as the actual definition of tangential dilation of umbral 
functions: 

(f®g)(x) = statu [/(m) + g(x-u)]. (10) 

You may verify that this has indeed the desired properties at the corresponding 
points with coordinates x/, Xg and Xj^g\ 

^f®g = ^f + ^9 

(f®g)(Xf^g)=f(Xf)+g(Xg) (11) 

(f®g)'(xf^g) = f'(xf) =g'(xg) 

In this function description, the tangential dilation has been studied extensively, 
though almost exclusively with an interest in the globally valid dilation cor- 
responding to an actual collision process, i.e. not permitting intersection of the 
boundary functions. That globalization is achieved by a replacement of the ‘stat’ 
in eq.(lO) by supremum and/or infimum operations. The literature on convex 
analysis [12] and mathematical morphology [8] is then relevant to its study. 

2.3 Convolution and the Fourier Transform 

It is interesting to compare the tangential dilation of (umbral) functions with 
the definition of their convolution, the basic operation of linear systems theory. 

administration rather than essential mathematics. It can be avoided by describing 
the boundaries as parametrized cnrves rather than as functions - but this would lose 
the obvious similarity to the Fourier transform. It is all just a matter of choosing 
the most convenient representation of an algebraic intuition which is the same in all 
cases, and we will not worry about such details. 
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Convolution of two functions / : IR — > IR and 5 : IR ^ IR is defined as: 

(/ * g){x) = y [f{u)g{x - u)]. 

Under the Fourier transform defined as 

= / du , 

J U 

this becomes multiplicative: 

T[f*g]{u;) = T[f]{u;)xT[g\{u;). 

It is good to remember why this is the case, to understand the role of the corre- 
spondence between the transformation formula and the convolution operation. 
The basic observation is that the convolution has eigenfunctions, i.e. functions 
which do not change their form more than by a multiplicative factor when used in 
convolution. These are the complex exponentials, each defined by an amplitude a 
and a frequency to: 

e^(x) = ae“U 

Then we obtain for the convolution of / by an eigenfunction: 

(/ * euj){x) = J du [f{u) X 

= (^J du f{u)e~^‘^^^ X (ae““) 

= X eu;(x). 

The eigenfunction is unchanged in frequency, but its amplitude changes by 
a multiplicative amount iF[f]{uj), which only depends on / and u>. Thus the 
Fourier transform is the (multiplicative) eigenvalue of the eigenfunction of the 
convolution. 

To take advantage of this, it is essential to be able to decompose an arbitrary 
function using eigenfunctions, for if this is done the involved convolution oper- 
ation becomes a simple frequency-dependent multiplication of transforms. This 
is where the Fourier transform transform plays a second role, for it is precisely 
that decomposition. 

This is demonstrated in the theory of Fourier transforms, which shows that 
(under certain weak conditions) / can be reconstructed from knowing its spec- 
trum T[f] , by 

f{x) = ^J^diyT[f]i^)^e^^‘'- 

So the Fourier transform plays two roles in linear systems theory: 

— The Fourier transform is a way of characterizing a signal / by a spectrum 
lF[f] of ( complex) amplitudes as a function of frequency in an invertible man- 



ner; 
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— such a spectral characterization of two signals / and g is just right for con- 
volution: multiplicative comhination x iF[g\) produces the spectrum of 

the convolution (/ * g) of the two signals. 

Those properties form the basis of linear systems theory, and its powerful spectral 
representation. 

2.4 The Legendre Transform as Spectrum 

We observe that the Legendre transform conforms to a similar pattern. Indeed, 
the concept of ‘eigenfunction’ is also valid to its analysis, although this should 
now be interpreted in an additive sense. The eigenfunctions of dilation are then 
the straight boundaries characterized by slope and intercept: 

= a + iox, 

since dilating a straight boundary those by any function /, a straight boundary 
with the same slope will again be the result (although if / is not convex, this 
may be a set of straight boundaries; apparently, amplitudes should be permitted 
to be multi-valued, but we knew that already). The additive eigenvalue of the 
line is computed as: 

U®eui){x) = statu [/(u) -I- a -I- ui{x - u)] 

= statu [f{u) — (jju]+a + ojx 
= -Cl/Kw) + e^{x). 

The additive eigenvalue of Ca, is thus precisely the Legendre transform of /. The 
analogy with the linear theory is strong indeed: 

— The Legendre transform can characterize an umbral function invertibly by a 
‘spectrum’ of (multi-valued) intereepts as a function of the slope, see eq.(8); 

— the additive combination of such spectra £[/] -I- C[g\ is the spectrum of the 
tangential dilation of the umbral functions / and g, see eq.(9). 

Therefore the essential properties of a Fourier transform vis-a-vis convolution of 
signals hold for the Legendre transform vis-a-vis tangential dilation of umbral 
functions. We may therefore expect that a lot of concepts which have proved 
useful in linear systems theory will be useful techniques in dealing with tangential 
dilation, and have applications to touching, collision computation and analysis 
of wave propagation. 

3 Systems Theories 

The comparison with the eigenfunctions of the convolution and tangential di- 
lation shows how we should want to interpret the results. There is a conserved 
parameter which can be used to index the eigenfunctions - in the case of con- 
volution it is a frequeney, in the case of tangential dilation a slope or tangent 
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direction. Then there is a second parameter, the amplitude or intercept or support 
vector which changes according to simple arithmetic (multiplication or addition) 
completely determined by the function g used to dilate with. 

Now that we have a very similar algebraic structure to the systems theory 
of linear convolution, we should be able to develop it along similar lines. We 
sketch this development; again the precise notation in healthy mathematics has 
not caught up (as was the case with Fourier transformations and delta functions, 
which only later obtained their embeddings in the mathematics of distribution 
theory), but we may be confident that it will be fixed in time (by proper math- 
ematicians rather than by us). 



3.1 Delta Functions 

In linear systems theory, the delta function (or impulse) is a convenient concept 
to describe sampling. It is simply the identity function of convolution: 

(/ * ^)(x) = f{x), for all X 



This implies that its spectrum should be the multiplicative identity. Under the 
Fourier transform, the equation transforms to tF[f]{io) x lF[(5](a;) = iF[f]{uj), so 
that T[S\{uj) = 1, independent of w. Reverse transformation (not elementary) 
yields the familiar delta function: 



(5(a:) 



1 

27T 




0 if X yf 0 

‘F if x = 0 



(Here the ‘1’ is used as a shorthand to denote that it is not the value of 5{x) 
which is 1 at X = 0, but its integral. Yet most systems engineers are used to 
thinking of the value as 1, especially since they often work with the Kronecker 
delta.) This function is sketched in Fig. 5a. 

We can make the same construction in the tangential dilation case, now 
demanding: 

(/©^)(x) = /(x) 

and after Legendre transformation this yields C[f]{uj) + £[5](w) = £[/](cj), so 
that £[5](w) =0 independent of w. We may invert this to: 

^(x) = stat,y[0 + I'x] 



{l^*X 


X = 


0} 


/{O} 


if 


X = 0 


l0 


if 


X yf 0 


fo 


if 


X = 0 


1^ — (X) 


if 


X yf 0 



The final step converts the set notation to the umbral function notation where an 
‘empty’ function value is represented by — oo (this has no shadow, so it represents 
no object point). 
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Fig. 5. Delta functions in linear theory (a) and contact theory (b), and their 
corresponding transforms 



The delta function for dilation is sketched in Fig. 5b. As an object, the delta- 
function is clearly the point at the origin. Being tangentially dilated by a point 
therefore reveals the shape of / (as does colliding with a point at the origin). 

3.2 Representation Using Delta-Functions 

In linear filtering, the representation of a signal as a weighted sum of eigen- 
functions is basic; but when moving towards sampling, another mode of rep- 
resentation is important. This is the decomposition of a function by means of 
delta- functions of appropriate magnitude. 

/(x) = (/ * (5)(a;) = J du f{u) x 6{x - u) 

Thus /(it) is like the multiplicative amplitude of the delta-function shifted to it. 

We can rewrite umbral functions in a similar manner, though now the ap- 
propriate delta-functions have amplitudes which are additive rather than multi- 
plicative: 

f{x) = (/©<5)(x) = stat„[/(u) -h 5{x - it)]. 

Let us verify this by rewriting the Legendre transform reconstruction formula 
into a formula containing a J-function: 

f{x) = sidX^[C[f]{v) + xv] 

= stati, statu [/(it) -I- (x — It) v] 
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= statu stati/[/(w) + (x — m) v ] 

= statu [/( m) + stati/[(x — u) v\] 

= statu [/(w) + 5{x — u)]. 

It is interesting to see how the stat operator provides the ‘scanning’ of all u, 
picking out x = u, in precisely the way the integral does this in the linear case. 

Working out the ‘stat’ operator in the final expression gives a clue on how 
to treat the derivative of the 5 functions: 

f{x) = statu [/(m) + 5{x - It)] 

= {/(«♦) + 5{x — I f'{ut.) — 5'{x — u*) = 0} 

= {f{u*) + 5{x - M*) I /'(«♦) 9 5'{x - u*)} 

Since this should be identical to f{x), the condition should be equivalent to 
stating that m* = x, independent of what the function / is. Therefore 5' {v) must 
apparently be defined to be non-zero only when w = 0, and then to contain all 
slopes; this feels perfectly reasonable. 



3.3 Translations 

The basic operation in the representation by delta-functions is a translation of 
the signal. Fortunately, such an (abscissa) translation is both a convolution and a 
tangential dilation. So this important operation is part of both systems theories. 

In linear systems, translation over t is represented by convolution with the 
shifted delta-function 6t{x) = S(x — t), of which the Fourier transform is 

T[5t]{io) = Jdu S{u - t)e-““ = x 

This gives 



T[] 

ft{x) = /(x - t) . — - lF[ft]{uj) = T[f]{oj) X e 



In contact systems, translation over t is represented as tangential dilation by the 
shifted delta-function 6t{x) = S(x — t), of which the Legendre transform is —cot: 

£[(5t](w) = statu — t) — cvu] = statu[5(x) — oj v] — cat = —cat 



This gives 

>c[-] 

ft(x) = fix - t) . — - /l[/t](w) = /^[/](w) - 

£-i[-] 



Note that the computations in both frameworks run completely analogously. 
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Fig. 6. Band limitations functions in linear theory (a) and contact theory (b), 
and their corresponding inverse transforms 

3.4 Bandwidth Limitation 

In linear systems theory, band limitation is achieved through multiplying the 
spectrum by the ideal bandpass filter 



Convolution with this function indeed leads to a limitation of the spectrum of 
a signal to the frequencies between — wq and wq, limiting the bandwidth, see 
Fig. 6a. 

In tangential dilation, we can achieve bandwidth limitation in a similar man- 
ner. We now need to make an additive bandpass filter, which should be defined 
as 



Now the inverse Legendre transform (most easily done pictorially) gives a ‘cone’- 
function: 




This inverse Fourier transform yields the famous ‘sinc’-function: 





C '^[H]{x) = -\uJox\ 
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Fig. 7. Impulse trains in linear theory (top) and contact theory (bottom), and 
their corresponding transforms 

Tangential dilation with this function produces a slope-limited function, see 
Fig. 6b. Those functions are known as Lipschitz- functions, and their theoretical 
importance as been recognized in theoretical developments in the mathematical 
morphology on umbral functions [1]. Since dilation of a function with a cone 
results in a kind of clamping, filling in the locally concave parts while passing 
the local maxima unchanged, it is relevant to the analysis of envelopes in signal 
processing [11]. 

3.5 Sampling 

In linear systems theory, there are various theorems on sampling. The most 
surprising of those is that certain signals can be reconstructed completely after 
sampling. 

Commonly sampling is described primarily in the spatial domain, as the 
multiplication by a train of impulses. The Fourier transform of this train is a 
train of impulses in the frequency domain, with a separation reciprocal to the 
separation in the spatial domain, see Fig. 7a. 




( 12 ) 
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For tangential dilation, the impulse train is a sum of shifted delta functions, of 
which the Legendre transform is a ‘star’ of lines in the slope domain, see Fig. 7b: 



This is a ‘star’, a union of linear functions through the origin, see Fig. 7b. 

3.6 Reconstruction after Sampling 

In linear filtering, the spatial sampling of a function / is done through multipli- 
cation by the impulse train (which is a sum of delta functions) . In the frequency 
domain, this leads to a convolution of the Fourier transform of this train and 
the spectrum iF[f], leading to a sequence of additively overlapping copies of the 
original spectrum iF[f] (see Fig. 8b). If those do not overlap, which happens if 
the original spectrum was band-limited and the sampling was sufficiently fine- 
grained (at least twice the highest frequency, this is the Nyquist criterion) one 
can reconstruct the original signal by multiplication of its spectrum by the band- 
pass filter, see Fig. 8c. In the spatial domain this is done through convolution of 
the sampled signal by the sinc-function corresponding to the band limitation. 

In tangential dilation, if we use the train of delta functions (made as a sum of 
5-functions) for sampling, this is done by an addition operation. This becomes a 
tangential dilation in the slope domain, by the star of lines which is the Legendre 
transform of the sum of impulses by eq.(13). This gives a smeared out spectrum 
over the directions present in the star, see Fig. 9b. The original spectrum is 
now not retrievable by a single global bandpass filter. Rather, we have to select 
an appropriate portion around each slope to approximate the original spectrum, 
i.e. we have to add a local bandpass filter of an appropriate width. This effectively 
approximates the spectral curve with a piecewise linear, locally tangent curve, 
with slopes takes from the slopes —ak prescribed by the transform of the impulse 
train. Transforming back to the umbral function domain, this implies tangential 
dilation by the inverse legendre transform of the local bandpass filters, which 
are cones. This tangential dilation yields precisely a first order interpolation of 
the sample points, see Fig. 9c! (The carefully chosen intersection slopes in the 
slope domain to select the right portions correspond to the slopes required to 
connect the discrete points in the spatial domain.) Therefore linear interpolation 
is analyzable within the new systems theory of contact. 

Exact reconstruction of a signal from its sampled version is now only possible 
if the original sample points were the vertices of a polygon (we have assumed 
above that the points were equidistant, for simplicity; this is not required, as the 
reader may check) . Then the linear interpolation of the sample points obviously 
retrieves the original polygonal umbral function. 



OO 








feez 

{—uj ak\k ^ 2}. 



(13) 



40 



Leo Dorst and Rein van den Boomgaard 




(b) 






Fig. 8. Linear systems theory: (a) Original function and spectrum (b) The effect 
of sampling (c) Nyquist reconstruction using convolution 



This is doable if one has only one polygonal umbral function, but when 
combining two such functions, it would be unusual for both to have their vertices 
at the same x-coordinates; one would then need to take the union of the sample 
locations. This is the counterpart of the Nyquist criterion for the sampling of 
tangential dilation operations - it is rather different in form, and seems less 
practical in its consequences. 

3.7 Discretization of Convolution and Tangential Dilation 

An important way of considering discretization is not as sampling of the original 
signals, but as a discretization of their relevant algebraic combination. One then 
desires a discretization method which ‘commutes’ with the basic operation, in a 
natural manner. Denoting discretization by D, we demand in the linear theory 
of convolution 

D(f*g) = {Df)*{Dg), 
and in the contact theory of tangential dilation 



D{f®g) = {Df)®{Dg). 
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Fig. 9. Contact systems theory: (a) Original function and spectrum (b) The 
effect of sampling (c) Piecewise linear reconstruction in tangential dilation 



Such discretizations are more easily designed in the spectral domain, where the 
combination operation is a simply arithmetical: 

T[D{f*g)]=T[Df]xT[Dg]. 

We will only consider the case when D is itself a convolution, since then we can 
use the associativity to re-arrange terms. So let Df = d* f for some function d. 
Then the Fourier transforms are simple: of the left hand side, it is T[d * {f * g)] = 
T[d] X T[f] X T[g] , and of the right hand side: T[d * /] x T[d * g] = x T[f] x 
T[g\. For general signals, equality implies J-[d\^ = J-[d\, so T[d\ is a projection 
of functions. Any selection on the spectrum using a sum of non-overlapping 
bandpass filters has this property. Taking for example a train of delta functions 
in the frequency domain, this implies a convolution by a widely spaced train 
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Fig. 10. Commutative discretization in contact systems theory: tangent dis- 
cretization using a limited slope spectrum 



of spatial delta functions, making the original signal periodic but retaining its 
continuity. It is therefore hardly a discretization. Only by sampling using a pulse 
train does one obtain a doubly discrete representation, discrete (and periodic) 
in both domains. 

By similar reasoning, the demand 

C[D{f®g)]=C[Df]+C[Dg]. 

may be satisfied by a tangential dilation D. So let Df = d®f for some function d. 
Then the left hand side gives £[d0(/0g)] = C[d] + C[f] + C[g], and the right hand 
side C[d®f] 0 £[d© 5 ] = 2C[d] + C[f] + L[g\. We thus find a sampling Df = d0/ 
which needs to satisfy 2C[d] = C[d\. This is an additive selector, and again a 
sum of non-overlapping bandpass filters has the property (or union, if you prefer 
the set notation). Since it is a sum, it corresponds to tangential dilation in the 
spatial domain. 

If we now use a train of delta functions in the slope domain, i.e. a discrete 
spectrum of slopes (not necessarily equidistant), this leads to a polyhedral ap- 
proximation of the spatial domain using those slopes in a star of linear functions 
(the derivation is analogous to that in section 3.5). This gives lots of infinite lines 
tangent to the function in the spatial domain. Selecting the correct portions of 
those in the spatial domain implies that we do a linear interpolation in the slope 
domain, leading to a linear tangent approximation in the spatial domain^ using 
the selected spectrum of tangents, see Fig. 10.^ 

So this structural discretization produces a polyhedral approximation which 
correctly represents the collection of offset tangent hyperplanes. To reconstruct 
the object, one also needs to store information on which hyperplanes are neigh- 
bors; one thus obtains a network of tangent hyperplanes. This is a discretization 
method which -by design- works well for contact: collision with the discretiza- 
tion is discretization of the collision. 

® This is the full story for convex functions. For functions with inflection points, 
separate branches results for each convex and concave section, but there are ways 
to connect those - we will not go into this here. It is interesting to note that if such 
a function is coarsely sampled (so that concave portions are skipped), one obtains a 
linear tangent discrete approximation to the convex hull of the umbral function. 
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It can be shown that this leads to a simple algorithm which can be per- 
formed fully in the spatial domain, in which one is permitted to do a sorting 
on directions and add the contributions of the directions; not only as intercepts, 
but also as line lengths, since those are additive as well. Such representations 
were first introduced in [-5], though without the backing of a systems theory 
which suggests the equal-slope-spectrum sampling method to retain closure of 
the representation. 

3.8 Filtering; Touch Sensing 

Just as the Fourier transform gives us a way to analyze filtering for convolutions 
(i.e. optical blurring), the Legendre transform gives this possibility for touch 
sensors (or collision testing). 

We would have liked to include a section here about the counterpart of Wiener 
filtering in stochastic signal processing, and other techniques to estimate un- 
known transfer functions from a statistically sufficiently rich collection of input 
signals. This would for instance permit the estimation of the unknown atomic 
probe used in scanning tunneling microscopy [6], to ‘de-dilate’ it from the mea- 
surements to obtain a good estimate of the actual atomic surface that was being 
scanned. 

However, for such applications one would really need the full theory which 
incorporates the impossibility of intersecting contacts, also in its statistical as- 
pects. We have not looked into this. In its present form, the analogous filtering 
techniques might be useful to the inversion of waves (as in seismic migration 
studies), but even that is conjectural. 



4 Towards a Systems Theory of Kissing 

The analogy of the Monge patch descriptions of contact to linear systems theory 
have led us to consider tangential dilation as a systems theory. We will now 
briefly investigate whether this also provides insights to the fuller geometrical 
representation of objects and their interactions through contact, and how we 
should discretize those. The results are preliminary, but promising. 



4.1 Gauss Sphere as Direction Spectrum 

The principle of the slope spectrum is that the function may be considered as a 
collection of tangent planes, characterized by a support function, in an invertible 
manner. 

We have indicated in section 1.2 that we can do this for arbitrary objects, in 
a coordinate-free manner, by the support function ct as a function of the tangent 
direction characterization by n or I. This is the extended Gauss sphere represen- 
tation of the ‘Gauss sphere’ of directions. We therefore view the Gauss sphere 
as the speetrum of directions, and the support function on it as the amplitude 
indicating the strength of presence of that direction in the object. This support 
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Fig. 11. The support function of an object sketched as a function on the Gaus- 
sian sphere of directions. For convenience in depiction, the outward pointing 
normal — n is drawn, rather than n 



function is, of course, multi-valued. We sketch this in Fig. 11; the geometric 
algebra computations are easier to perform than to sketch... 

Combining two objects in a dilation operation may be done as addition of 
their direction spectra. Whether or not this is practical depends on finding a ‘fast 
direction transform’; but viewing the operation in its natural spectral description 
should provide analytic insight in the dilation and collision operations. 



4.2 Point as Delta-Object 

The ‘delta object’ of the dilation operation is an object which is the additive 
identity for all directions. It thus has as representation: 7?.[n] = n-1- e° x 0 = n. 
This is a point at the origin. And indeed, if a boundary A is being dilated with 
(or collided by) a point object, the point describes the boundary A. 



4.3 Discretization of Collision Computations 

We have seen in the umbral function description that a discretization which 
commutes with the dilation is obtained by choosing a (discrete) spectrum of 
directions, and encode the object as a network of offset tangent hyperplanes. 

This is also true for geometrical objects. If all objects involved in the dilation 
are represented using this same limited spectrum of directions, then the dilation 
is simply computable through the addition of the support functions, without 
any additional interpolation. Again, discretization of the dilation equals dilation 
of the diseretization equals addition of the speetrum of support funetions, see 
Fig. 12. This is a potentially cheap way of evaluating collisions to any desired 
accuracy. 
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Fig. 12. Contact of objects: the tangent hyperplane supports are additive, (a) 
Spatial representation, the objects have been discretized using a icosahedral 
spectrum of directions; (b) Support functions on the Gauss sphere add up 



5 Summary 

We have seen that various operations which have the nature of collision or wave 
propagation can be analyzed using a ‘systems theory’. This is based on the real- 
ization that the boundaries of the objects involved can be decomposed according 
to a spectral transform, and that in terms of this spectral transform the com- 
bination operation (wave propagation or collision) becomes a simple arithmetic 
operation, namely addition. The geometric intuition behind both is the same: 
an object boundary can be indexed as the support of its tangent planes, as a 
function of their direction. The spectral transform is the ‘support function on 
the Gauss sphere’ for objects, and the ‘Legendre transform’ for functions. 

Especially the Legendre transform formulation shows that the resulting alge- 
braic structure is very similar to the linear systems theory of convolution, with 
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the Fourier transform as the spectrum (see Fig. 13, after [3]). We have shown 
how that analogy gives useful techniques with sensible results for the analysis 
of collision/propagation of functions. The techniques should be transferable to 
the interaction of the boundaries of objects in space. We believe that this will 
lead to analytical insights which will be the foundation in the design of efficient 
algorithms for wave propagation, robotic collision, and object growing. 
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Fig. 13. Comparison of the two systems theories 



References 

1. van den Boomgaard, R., Smeulders, A. W. M.: The morphological structnre of 
images: the differential equations of morphological scale space. IEEE Trans, on 
Pattern Analysis and Machine Intelligence 16 (1994) 1101-1113 38 

2. Dorst, L.: Objects in contact: boundary collisions as geometric wave propagation. 
In: Geometric Algebra: A Geometric Approach to Gomputer Vision, Quantum and 
Neural Gomputing, Robotics and Engineering, E. Bayro-Gorrochano, G. Sobczyk, 
eds. Birkhauser (2000) Ghapter 17, 355-375 26 

3. Dorst, L., van den Boomgaard, R.: Morphological signal processing and the slope 
transform. Signal Processing 38 (1994) 79-98 26, 29, 46 

4. Dorst, L., van den Boomgaard, R.: The Support Cone: a representational tool 
for the analysis of boundaries and their interactions. IEEE PAMI 22(2) (2000) 
174-178 26 

5. Ghosh, P. K.: A Unified Computational Framework for Minkowski Operations. 
Comput. & Graphics, 17(4) (1993) 357-378 43 

6. Hawkes, P. K.: The evolution of electron image processing and its potential debt 
to image algebra. Journal of Microscopy 190(1-2) (1998) 37-44 43 

7. Heijmans, H. J. A. M.: Morphological Image Operators. Academic Press, Boston 
(1994) 28 



The Systems Theory of Contact 



47 



8. Heijmans, H. J. A. M., Maragos, P.: Lattice calculus of the morphological slope 
transform. Signal Processing 59 (1997) 17-42 22, 29, 31 

9. Hestenes, D.: The design of linear algebra and geometry. Acta Applicandae Math- 
ematicae 23 (1991) 65-93 24 

10. Lasenby, J., Fitzgerald, W. J., Doran, C. J. L., Lasenby, A. N.: New Geometric 
Methods for Computer Vision. Int. J. Comp. Vision 36 ( 3 ) (1998) 191-213 24 

11. Maragos, P.: Slope transforms: theory and application to nonlinear signal process- 
ing. IEEE Transactions on Signal Processing 43(4) (1995) 864-877 38 

12. R. T. Rockafellar, Convex analysis, Princeton University Press (1972) 31 



An Associative Perception- Action Structure 
Using a Localized Space Variant 
Information Representation 



Gosta H. Granlund 



Computer Vision Laboratory, Department of Electrical Engineering, 
Linkoping University, SE-581 83 Linkoping, Sweden 
gostaSisy .liu.se 



Abstract. Most of the processing in vision today uses spatially invari- 
ant operations. This gives efficient and compact computing structures, 
with the conventional convenient separation between data and opera- 
tions. This also goes well with conventional Cartesian representation of 
data. 

Currently, there is a trend towards context dependent processing in var- 
ious forms. This implies that operations will no longer be spatially in- 
variant, but vary over the image dependent upon the image content. 
There are many ways in which such a contextual control can be im- 
plemented. Mechanisms can be added for the modification of operator 
behavior within the conventional computing structure. This has been 
done e.g. for the implementation of adaptive filtering. 

In order to obtain sufficient flexibilility and power in the computing 
structure, it is necessary to go further than that. To achieve sufficiently 
good adaptivity, it is necessary to ensure that sufficiently complex control 
strategies can be represented. It is becoming increasingly apparent that 
this can not be achieved through prescription or program specification 
of rules. The reason being that these rules will be dauntingly complex 
and can not be be dealt with in sufficient detail. 

At the same time that we require the implementation of a spatially vari- 
ant processing, this implies the requirement for a spatially variant in- 
formation representation. Otherwise a sufficiently effective and flexible 
contextual control can not be implemented. 

This paper outlines a new structure for effective space variant processing. 
It utilises a new type of localized information representation, which can 
be viewed as outputs from band pass filters such as wavelets. A unique 
and important feature is that convex regions can be built up from a 
single layer of associating nodes. The specification of operations is made 
through learning or action controlled association. 



1 Introduction 

Most of the processing in vision today uses spatially invariant operations. This 
gives efficient and compact computing structures, with the conventional conve- 
nient separation between data and operations. This also goes well with conven- 
tional Gartesian representation of data. 

G. Sommer and Y. Y. Zeevi (Eds.): AFPAC 2000, LNCS 1888, pp. 48—68, 2000. 

(c) Springer- Verlag Berlin Heidelberg 2000 
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Currently, there is a trend towards context dependent processing in various 
forms. This implies that operations will no longer be spatially invariant, but vary 
over the image dependent upon the image content. 

There are many ways in which such a contextual control can be implemented. 
Mechanisms can be added for the modification of operator behavior within the 
conventional computing structure. This has been done e.g. for the implementa- 
tion of adaptive filtering [5]. 

In order to obtain sufficient flexibilility and power in the computing structure, 
it is necessary to go further than that. To achieve sufficiently good adaptivity, 
it is necessary to ensure that sufficiently complex control strategies can be rep- 
resented. It is becoming increasingly apparent that this can not be achieved 
through prescription or program specification of rules. The reason being that 
these rules will be dauntingly complex and can not be be dealt with in sufficient 
detail. 

At the same time that we require the implementation of a spatially variant 
processing, this implies the requirement for a spatially variant information rep- 
resentation. Otherwise a sufficiently effective and flexible contextual control can 
not be implemented [2]. 

Most information representation in vision today is in the form of iconic ar- 
rays, representing the pattern of intensity and color or some function of this, such 
as edges, lines, convexity, etc. This is advantageous and easily manageable for 
stereotypical situations of images having the same resolution, size, and other typ- 
ical properties. Increasingly, various demands upon flexibility and performance 
are appearing, which makes the use of array representation less attractive. 

The increasing use of actively controlled and multiple sensors requires a more 
flexible processing and representation structure. The data which arrives from 
the sensor(s) is often in the form of image patches of different sizes, rather than 
frame data in a regular stream. These patches may cover different parts of the 
scene at various resolutions. Some such patches may in fact be image sequence 
volumes, at a suitable time sampling of a particular region of the scene, to allow 
estimation of the motion of objects [6]. The information from all such various 
types of patches has to be combined in some suitable form in a data structure. 

The conventional iconic array form of image information is impractical as it 
has to be searched and processed every time some action is to be performed. It 
is desirable to have the information in some partly interpreted form to fulfill its 
purpose to rapidly evoke actions. Information in interpreted form, implies that it 
should be represented in terms of content or semantic information, rather than in 
terms of array values. Content and semantics implies relations between units of 
information or symbols. For that reason it is useful to represent the information 
as relations between objects or as linked objects. The discussion of methods for 
representation of objects as linked structures will be the subject of most of this 
paper, but we can already observe how some important properties of a desirable 
representation relate to shortcomings of conventional array representations: 

— An array implies a given size frame, which can not easily be extended to 

incorporate a partially overlapping frame 
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— Features of interest may be very sparse over parts of an array, leaving a large 
number of unused positions in the array 

— A description of additional detail can not easily be added to a particular 
part of an array 

The following sections of this paper outline a new structure for effective space 
variant processing. It utilises a new type of localized information representation. 
The specification of operations is made through learning or action controlled 
association. 

2 Channel Information Representation 

A continuous representation of similarity requires that we have a metric or dis- 
tance measure between items. For this purpose, information is in the associative 
structure expressed in terms of a channel representation[Q,A\. See Figure 1. 




Fig. 1. Channel representation of some property as a function of match between 
filter and input pattern 



Each channel represents a particular property measured at a particular po- 
sition of the input space. We can view such a channel as the output from some 
band pass filter sensor for some property /citeg78a. An appropriate object evokes 
an output from the activated channel, corresponding to the match between the 
object presented and the properties of the filter, characterizing the passband of 
the channel. This resembles the function of biological neural feature channels. 
There are in biological vision several examples available for such properties; edge 
and line detectors, orientation detectors, etc [3,8]. 

If we view the channel output as derived from a band pass filter, we can 
establish a measure of distance or similarity in terms of the parameters of this 
filter. See Figure 1. For a conventional, linear simple band pass filter, the phase 
distance between the flanks is a constant 7t/2. Different filters will have different 
band widths, but we can view this as a standard unit of similarity or distance, 
with respect to a particular channel filter. 
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2.1 Sequentially Ordered Channels 



There are several envelope functions with the general appearance of Figure 1, 
such as Gaussian and trigonometric functions. Functions which are continu- 
ous and have continuous derivatives within the resolution range are of inter- 
est. For the introductory discussion of channel representation, we assume the 
representation of a single scalar variable x, as an ordered one-dimensional se- 
quence of band pass function envelopes Xk, which represent limited intervals, say 
fc— |<a:<fc-|-|, ofa scalar variable x. A class of functions which has some 
attractive properties for analysis is 



Xk{x) =pk{x) 



cos^(f(x — k)) if fc— |<x</c-|-| 
0 otherwise 



( 1 ) 



The scalar variable x can be seen as cut up into a number of local but 
partially overlapping intervals, fc— | < x < fc-|-|, where the center of each 
interval corresponds to x = fc. It should be observed that we use the notation of x 
without subscript for the scalar variable and Xk with subscript for the channel 
representation of scalar variable x. The channel output signals which belong to 
a particular set are bundled together, to form a vector which is represented in 
boldface: 



x=[xi X2 ... Xk ... xk]'^ (2) 

We assume for conceptual simplicity that the numbers k are consecutive 
integers, directly corresponding to the numbers of consecutive channels. This 
allows a more consistent treatment and a better understanding of mechanisms. 
We are obviously free to scale and translate the actual input variable in any 
desired way, as we map it onto the set of channels. An actual scalar variable ^ 
can be scaled and translated in the desired way 

X = scale ■ — translation) (3) 

to fit the interval spanned by the entire set of channels {xk}- We will later 
see how other nonlinear scaling transformations can be made. 

With each channel center representing consecutive integers, the distance be- 
tween two adjacent channels in terms of the variable x is one unit. From Equa- 
tion 1 it is apparent that the distance in terms of angle is ^ or 60°. We will in 
subsequent discussions refer to this as the typical channel distance of ^ or 60°. 

In Figure 2 we have a one-dimensional set of 13 sequentially ordered channels. 
The position of each channel is indicated by the dashed lines. It is designed to 
provide a channel representation of scalars within a range 0 < x < 10. To provide 
a continuous representation at the boundaries of this interval, the set of channels 
is padded with an extra channel at each end. Components from these channels 
are required to perform a reliable reconstruction back to a scalar value from the 
vector representation. In order to start adapting ourselves to the major purpose 
of processing of spatial data, we can view Figure 2 as a one-dimensional image 
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H ^ ^ ^ ^ ^ 1 • ^ ^ h- 

0 7 10 




x=[0 0 0 0 0 0 0 0.25 1.0 0.25 0 0 0 f 

Fig. 2. Channel representation of a scalar x = 7 
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X 



k 



with a single simple object in the form of a dot. The channels are scaled to unit 
resolution between the filter centers, and the center values correspond to values 

k = [ - 1 0 1 2 3 4 5 6 7 8 9 10 11 (4) 

If the set of channels is activated by a scalar x = 7, represented by a point at 
position X = 7, we will obtain a situation as indicated in Figure 2. We assume 
that the output of a channel is given by the position of the point x = 7 within 
its band pass function, according to Equation 1. The channels activated are 
indicated by the solid line curves. The scalar x = 7 will produce the vector x as 
indicated in Figure 2. 

Below are a few additional examples which hopefully will shed some light 
on the representation, in particular at the boundaries. We still assume a set of 
13 channels which are used to represent scalar values in the interval between 0 
and 10. 



X = 0.0 
X = 3.73 
X = 9.0 
a: = 10.0 



x=[0.25 1.0 0.25 0000000000]^ 
x=|o 0 0 0 0.52 0.92 0.06 0 0 0 0 0 0]"^ , . 

x=|000000000 0.25 1.0 0.25 0]^ 
x=|0000000000 0.25 1.0 0.25]^ 



We can clearly see the necessity for padding with extra channels at the bound- 
aries. Under the conditions stated earlier, we have the following values of Xk 
within an interval k— ^<x<k+^: 
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Xk{k - |) = cos^(-§) = 0 
Xk{k — 1) = cos^(— |) = 0.25 

Xfe(fc) = cos^(O) = 1 (6) 

Xk{k + 1) = cos^l^) =0.25 
a;fc(fc-l- |) = cos^(f) =0 

In relation to this, it can be shown that 

^Xfc(x) = 1.5 a -^<x<K-^ (7) 

k 

where K is the last channel used for padding. This consequently gives a 
margin of 1/2 outside the second last channel. This means that the sum of all 
channel contributions over the entire channel set from the activation by a single 
scalar x is 1.5, as long as x is within the definition range of the entire set. Related 
properties are: 



Xk{k - 1) + Xk{k) + Xk{k + 1) = Xk-i{k) + Xk{k) + Xk+i{k) = 1.5 (8) 

Most components of x are zero, with only two or three non-zero components 
representing the scalar value x as discussed earlier. 

An array may be activated by more than one value or stimulus. In Figure 3 we 
have two scalars, at x = 1 and a: = 7. It is apparent that as the difference between 
the two scalars decreases, there is going to be overlap and interference between 
the contributions. This indicates a need to worry about proper resolution, like 
for any sampling process. Still, the representation gives us the possibility to keep 
track of multiple events within a single variable, without their superimposing, 
something which a Cartesian representation does not allow. 

2.2 Two-Dimensional Channels 

Most of the information we want to deal with as input is two-dimensional, or pos- 
sibly of even higher dimensionality. For that reason we will extend the definition 
to two dimensions, x and y: 

{ f cos^(j\/(x - ky + {y- ly ) 

Pki{x,y) = I \ if fc - I < X < fc -I- 1, / - I < y < / + I (9) 

[ 0 otherwise 

The arrangement of sequential integer ordering with respect to k and I is 
similar to the one-dimensional case. The output from a channel is now dependent 
upon the distance, d = jy^(x — + {y — 0^ from the center of a particular 

channel at position {k,l) in the array. 

As we will see later, good functionality requires that there are several non- 
zero outputs generated from a sensor array. As we in this case are dealing with 
point objects, this requires that there is an overlap between the transfer functions 
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H • ^ 1 ^ ^ ^ ^ ^ h 

0 1 7 10 




Channels 



0 10 
x = [ 0 0,25 1.0 0.25 0 0 0 0.25 1.0 0.25 0 0 0 

Fig. 3. Channel representation of two scalars at x = 1 and x = 7 



of the different detectors. When the object is a line, or other spatially extended 
object, no overlap is required. Rather we will see that receptive fields of sensors 
normally only have to cover parts of the array. 

So far, we have only dealt with the position dependent component of the 
channel band pass function. Generally, there is as well a component dependent 
upon some property of the sensor, such as dominant orientation, color, curvature, 
etc. Equation 9 will then take on the general form: 

f I cos2(|y/(x - fc)2 + {y 

Pkim{x, y, (j)) = Pki{x, y)pm{4>) = sl iffc-|<x<fc+|, Z-|<y<Z + | 

[ 0 otherwise 

( 10 ) 

As this property is often modular and e.g. representing an angle, it has been 
given an argument 4>. Because the use of modular channel sets is not restricted 
to this application, we will give it a somewhat more extensive treatment. 



3 Modular Channel Sets 

There are several situations where it is desirable to represent a modular or cir- 
cular variable, such as angle, in a channel representation. There are two different 
cases of interest: 

— Modular channel distance ^ 

— Modular channel distance j 

Of these, we will only deal with the first one: 
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3.1 Modular Channel Distance ^ 

The structure easiest to deal with has a channel distance of similarly to 
the earlier treatment. In this case, the least complex structure contains three 
channels in a modular arrangement: 






cos^{4> — mj) if — m = 0,l,2; 
0 otherwise 



( 11 ) 



The scalar variable </> will be represented by a vector 



(j) = [(j}Q (j)i ( 1 ) 2 ]^ = {(j)m} m = 0,l,2 (12) 

As earlier, we use the notation of 4> without subscript for the scalar variable 
and 4>m with subscript for the channel representation of scalar variable (j)- The 
channels which belong to a particular set are bundled together, to form a vector 
which is represented in boldface, to the extent that this type font is available. 

The modular arrangement implies that as the scalar argument increases from 
^ it will not activate a fourth channel but map back into channel 1, which is 
equivalent to the dashed channel curve in Figure 4. This is the minimum number 
of channels which will provide a continuous representation of a modular variable. 
It is for example useful for the representation of orientation of lines and edges. It 
can be shown that this is the minimum number of filter components which give 
an unambiguous representation of orientation in two dimensions [5]. If we view 
the distance between adjacent channels to ^ or 60° like in the earlier discussion, 
this implies that the total modulus for 3 channels is tt or 180°. This is well suited 
for representation of “double angle” [5] features such as the orientation of a line. 
If it is desired to represent a variable with modulus 27 t or 360°, the variable </> can 
be substituted by (j)/2 in Equation 11 above. Any different desired modulus can 
be scaled accordingly. There are several different ways to express the scaling. In 
this presentation we have tried to maintain the argument in terms of the cos^() 
function as a reference. 

Assuming a resolution of a 10 to 20 levels per channel, this will give a total 
resolution of 3° to 6° given modulus 180° and a total resolution of 6° to 12° given 
modulus 360°. This is sufficient for many applications. The modular arrangement 
is illustrated in Figure 4. 

If a higher resolution is desired, more channels can be added in the modular 
set as required. There are several ways to express the scaling, such as in constant 
modulus or in constant channel argument. The way selected here is in terms of 
constant argument of the cos^() function. This gives a variable modulus for 
the entire system, but makes it easy to keep track of the type of system. The 
generalized version becomes: 






cos^ {(j) — m^) if — m = 0, 1, . . . , M — 1 

0 otherwise 

( 13 ) 
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Fig. 4. Three component modular channel vector set 



The scalar variable (f> will be represented by a vector 

4>=[4'o4'i ■ ■ = {4>m} TO = 1 modulus (14) 

4 Variable Resolution Channel Representation 

In the preceding discussion we have assumed a constant or linear mapping and 
resolution for the variable in question to the channel vector representation. There 
are however several occasions where a nonlinear mapping is desired. 



4.1 Logarithmic Channel Representation 

In many cases it will be useful to have a representation whose resolution and 
accuracy varies with respect to the value of the variable. As an example, we can 
take the estimated distance z to an object, where we typically may require a 
constant relative accuracy within the range. 

We can obtain this using a logaritmic mapping to the channel representation. 



_ /cos^(f( ^log(2:- zo) - fc)) if fc - I < Mog(z - zo) < fc+ I 
^0 otherwise 

(15) 

It is convenient to view the process of scaling as a mapping to the integer 
vector set. There are two cases of scaling which are particularly convenient to 



use: 
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— One octave per channel. This can for example be achieved by using a map- 
ping X = ^log(z— zg), where zg is a translation variable to obtain the proper 
scaling for z. 

— One decade per two channels. This can for example be achieved by using a 
mapping x = ^°log(z — zq)I2, where zg is a translation variable to obtain 
the proper scaling for z. 



4.2 Arbitrary Function Mapping 

A mapping with an arbitrary function x = f{z) can be used, as long as it is 
strictly monotonous. It is possible to employ such a function to obtain a variable 
resolution in different parts of a scene, dependent upon the density of features 
or the required density of actions. 

4.3 Foveal Arrangement of Sensor Channels 

A non-uniform arrangement of sensors with a large potential is the foveal struc- 
ture. See Figure 5. A foveal window, has a high density of sensors with a small 
scale near the center of the window. More peripherally, the density decreases at 
the same time as scale or size of sensors increases. This is similar to the sensor 
arrangement in the human retina. 

The low level orientation outputs from the sensor channels will be produced 
from the usual procedures. They should have a bandwidth, which corresponds 
to the size of the sensor channel field, as illustrated in Figure 5. This implies 
a representation of high spatial frequencies in the center and low frequencies in 
the periphery. 




Fig. 5. Foveal arrangement of channels in sensor window 



As the computing structure can easily deal with a non-uniform arangement 
of sensors, there is a great deal which speaks in favor of a foveal arrangement of 
sensors. It provides a high resolution at the center of the visual field. While the 
lower resolution towards the periphery does not provide detailed information, it 
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is sufficient to relate the high resolution description to its background, as well 
as to guide the attentive search mechanism to regions of interest. 

5 Arrangement of Channels 

The abstract function of this associative structure is to produce a mapping 
between a set of arbitrarily arranged channels for feature variables, and a set of 
sequentially ordered channels for response variables. This constitutes a process 
of recognition. 

We assume two distinctly different categories of channel representations: 

1. Sequentially ordered channels for response variables 

2. Arbitrarily arranged channels for sensor and feature variables 

What we have been dealing with so far, can be said to imply the first cate- 
gory of response variables. This reflects the fundamental property that response 
states are defined along, at least locally, one-dimensional spaces. We assume an 
availability of consecutive, sufficiently overlapping channels which cover these 
spaces. 

In general, there is no requirement for a regular arrangement of channels, 
be it on the input side or on the output side. The requirement of an orderly 
arrangement comes as we need to interface the structure to the environment, 
e.g. to determine its performance. We typically want to map the reponse out- 
put channel variables back into scalar variables in order to compare them with 
the reference. The mapping back into scalars is greatly facilitated by a regular 
arrangement. 



5.1 Arbitrarily Arranged Sensor Channels 

For sensor channels, we assume an arrangement which is typically two-dimen- 
sional, or in general multi-dimensional. While the response space is assumed to 
be one-dimensional as described above, the sensor or feature space is assumed to 
be populated with arbitrarily arranged detectors, where we have no guarantee 
for overlap or completeness. See Figure 6. As we will see, there is no problem for 
the associative structure to use an arbitrarily arranged array of input channels, 
as long as it is stable over time, because an important part of the learning process 
is to establish the identity of input sensor or feature channels. 

The preferential orientation sensitivity of a sensor is indicated as a line seg- 
ment, and the extent of the spatial sensitivity function is indicated by the size 
of the circle. As indicated in this figure, detectors for orientation may typically 
have no overlap, but rather be at some distance from each other. The reason is 
that an expected object, such as a line, has an extent, which makes it likely that 
it will still activate a number of sensor channels 

Like any other analysis procedure, this one will not be able to analyze an 
entire image of say 512-512 elements in one single bite. It is necessary to limit the 
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Fig. 6. Example of random arrangement of orientation detectors over space 



size of a processing window onto the image. We assume that a sensor map window 
contains 40-40 = 1.6- 10^ orientation detectors, distributed as a two-dimensional 
array. Each orientation detector contains a combination of an edge and a line 
detector to produce a quadrature bandpass output. Detectors are assumed to be 
distributed such that we only have one detector for some preferred orientation 
within some neighborhood. This will give a lower effective resolution with respect 
to orientation over the array, corresponding to around 20 • 20 = 400 orientation 
detectors with a full orientation range. Detectors will have to be distributed 
in an arrangement such that we do not have the situation that there are only 
detectors of a particular orientation along a certain line, something which may 
happen with certain simple, regular arrangements. 

Given no overlap between sensors, the reader may suspect that there will be 
situations, where an applied line will not give an output from any sensor. This 
is true, e.g. when a line is horizontal or vertical in a regular array. It is however 
no problem to deal with such situations, but we will leave out this case from the 
present discussion. 

The channel representation has an elegant way to represent the non-existence 
of information, which is something totally different from the value 0. This is 
very different from the representation in a common Cartesian array, where all 
positions are assumed to have values, which are as well reliable. The channel 
representation does not require such a continuity, neither spatially, nor in terms 
of magnitude. This allows for the creation of a more redundant representation. 
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6 Feature Vector Set for Associative Machinery 

A sensor channel will be activated depending upon how the type of stimulus 
matches, and how its position matches. The application of a line upon an array 
as indicated in Figure 6, will evoke responses from a number of sensors along 
the lenghth of the line. 

All sensor channels which we earlier may have considered as different vector 
sets, with different indices, will now be combined into a single vector. We can 
obviously concatenate rows or columns after each other for an array such as 
in Figure 6; we can freely concatenate vectors from different sensor modalities 
one after the other. We will in the ensuing treatment for simplicity assume that 
all sensor channels to be considered, are bundled together into a single sensor 
channel vector set: 



X = [xi X2 ... xkY' ^ = 1, . . . , AT (16) 

We can see each sensor channel as an essentially independent wire, carrying 
a signal from a band pass filter, describing some property at some position of 
the image. It is assumed that we have a set of such sensor channels within some 
size frame of interpretation, which is a subset or window onto the image to be 
be interpreted, which is substantially smaller than the entire image. The frame 
of interpretation may in practise contain somewhere between 10^ to 10^ sensor 
channels, which is equivalent to the dimensionality AT of x, dependent upon 
the problem and available computational resources. This vector is very sparse 
however, because most sensors do not experience a matching stimulus, which 
gives the vector a density, typically from 10“^ to 10“^. 

A particular length of line at a particular position and orientation, will pro- 
duce a stimulation pattern reflected in the vector x which is unique. As we will 
see next, this vector can be brought into a form such that it can be associated 
with the state vectors (length, orientation, position) related to it. 

The set of features used for association, derives from the above mentioned 
sensor channels, as illustrated in Figure 7. 

We will use the notation: 

— Sensor channel vector set: x = [xi X 2 ... XkY" = {xk} fc = 1, . . . , AT 

— Feature channel vector set: a = [oi 02 ... an]^ = {oh} h = 1,. . . ,H 

The sensor channel vector x is an arbitrary but fixed one-dimensional ar- 
rangement of the outputs from the two-dimensional sensor channel array. The 
sensor channel vector x forms the basis for the feature channel vector, a, which 
is to be associated with the response state. The feature vector can contain three 
different functions of the sensor vector: 

1 . Linear components This is the sensor channel vector itself, or components 

thereof. This component will later be denoted simply as x. 
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Fig. 7. Illustration of steps in going from sensor array to feature vector 



2. Autocovariant components These are product components of type 
{x\Xi, X2X2, jXfcXfc), which are the diagonal elements of the covari- 
ance matrix. The corresponding vector containing these components will be 



3. Covariant components These are product components of type 

{xiX2j X1X3, . . . ,Xk-iXk), which are the off-diagonal elements of the co- 
variance matrix. The corresponding vector containing these components will 
be denoted as 

The feature vector used for association will be: 



of which in general only the last covariant components will be present. Exper- 
iments indicate that the covariant feature components are the most descriptive as 
they describe coincidences between events, but the existence of the others should 
be kept in mind for various special purposes such as improved redundancy, low 
feature density, etc. 

From this stage on, we will not worry about the sensor channel vector set x, 
and only use feature channel vector set, a. For that reason you will see some of 
the indices recycled for new tasks, which will hopefully not lead to any confusion. 

Before we go into the rest of the associative structure, and how this feature 
vector is used, we will recognize the fact that we can recover the conventional 
scalar meaning of data expressed as a channel vector. 



denoted as xx, 



-T 

auto' 



X 



a = 

T 



(17) 
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7 Reconstruction of Scalar Value From Channel Vectors 

It should first be made clear that this computing structure is intended for consis- 
tent use of information represented as channel signals as explained earlier. Input 
to a computing unit will have the channel representation, as will normally the 
output. The output from one unit or computing stage will be used as input to 
another one, etc. 

As a system of this type has to interface to the external world, input or 
output, requirements become different. For biological systems there are sensors 
available which do give a representation in this form, as well as that output 
actuators in the form of muscle fibers can directly use the channel signal repre- 
sentation. 

For technical systems, it will be necessary to provide interfaces which convert 
between the conventional high resolution cartesian signal representation and 
the channel representation. We have in the introduction discussed how this is 
accomplished for input signals. We will now look at how this can be done for 
output signals as well. Output signals which will be used to drive a motor or 
similar device, or for visualization of system states. 

The output from a single channel Uk of a response vector u, will not provide 
an unambiguous representation of the corresponding scalar signal u, as there will 
be an ambiguity in terms of the position of u with respect to the center of the 
activated channel Uk- This ambiguity can be resolved in the combination with 
adjacent channel responses within the response vector u = {uk}- By using a suf- 
ficiently dense representation in terms of channels, we can employ the knowledge 
of a particular similarity or distance between different channel contributions. 

It can be shown that if the distance between adjacent channels is 60° or less, 
we can easily obtain an approximative reconstruction of the value of u as a linear 
phase. Reconstruction of the scalar value which corresponds to a particular 
response vector u, formally denoted as sconv: 

Ue = sconv{n) = sconv{{uk}) k = 1, . . . ,K (18) 

can, given the earlier discussion, be implemented in several ways. We will 
however leave out the details of the computation in this context. 

8 System Structure for Training 

The general aspects of training are obviously related to the current large field 
of Neural Networks [7]. Training of a system implies that it is exposed to a 
succession of pairs of samples of a feature vector a and a corresponding response 
vector u. 

There are several ways in which this training can be done, but typically one 
can identify a training structure as indicated in Figure 8. 

The Pseudorandom Training Sequencer supplies transformation parameters 
of a training pattern to the system. The most characteristic property of this 
is that the output variables are guaranteed to vary continuously. The training 



Associative Perception- Action 



63 




Fig. 8. System structure as set up for training, in interaction with the environ- 
ment 



variables have to cover the space over which the system is expected to operate. 
The Pseudorandom Training Sequencer is expected to produce its output in 
conventional digital formats. 

The training data is input to the External World Simulator or interface. 
There it is generating the particular transformations or modes of variation for 
patterns, that the system is supposed to learn. This can be the generation of 
movements of the system itself, which will modify the precepts available 

The Geometric Mapper will in the general case produce a two-dimensional 
projection from a three-dimensional world, such as to implement a camera. 

The Receptor to Channel Mapper will in the general case convert an image 
projected onto it, into a parallel set of channels, each channel describing some 
property according to the discussion earlier. 

The training data, representing transformations to the input pattern, is also 
supplied to the Response to Channel Mapper, where it is converted from con- 
ventional Cartesian format to the channel representation of the response state, 
as discussed earlier. This information is supplied directly to the output side of 
the associative computation structure. 



8.1 Basic Training Procedure 

The basic training procedure is to run the Pseudorandom Training Sequencer to 
have it vary its output. This will have an effect onto the external world model 
in that something changes. The response variables out from the Pseudorandom 
Training Sequencer will also be fed to the output of the Computing Structure. 
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We will in this discussion assume batch mode training^ which implies that 
pairs of corresponding feature vectors a and response vectors u are obtained for 
each one of N samples, which form matrices A and U: 



f U = [Ui U 2 ... u„ Utv] 

I A = [ai a 2 . . . a„ ... a^] 

These matrices are related by the linkage matrix C: 



(19) 



U = CA 



( 20 ) 



From this matrix equation, the coupling or linkage matrix C can be solved, 
superficially expressed as 



C = U/A (21) 

The feature matrix A may contain tens of thousands of features, represented 
by tens of thousands of samples. This implies that the method of solution has to 
be chosen carefully, in order not to spend the remaining part of the millenium 
solving the equation. 

There are now very fast numerical methods available for an efficient solution 
of such systems of linear equations. These methods utilize the sparsity of the A 
and U matrices; i.e. the fact that most of the elements in the matrices are zero. 
Although this is an important issue in the use of the channel representation, 
it is a particular and well defined problem, which we will not deal with in this 
presentation. One of the methods available is documented in a Ph.D. Thesis by 
Mikael Adlers: Topics in Sparse Least Squares Problems [1]. 

8.2 Association as an Approximation Using 
Continuous Channel Fhnctions 

What happens in the computing structure during training, is that the response 
channel vector u will associate with the feature channel signal vector a. The 
association implies that the output response is approximated by a linear com- 
bination of the feature channel signals. This is illustrated for an actual case in 
Figure 9. 

We can now compute the approximating function for a particular response 
node k over the sample points n: 



'^kn — ^ ^ Ckh^hn (^^) 

h 

The association during the training implies finding the coefficients Ckh which 
implement this approximation. We can see that a particular response channel 
function is defined over some interval of samples, n, from the training set. We can 
vary u continuously, and as different channels ..., Uk-i, Uk, Uk+i , ... are activated. 
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Fig. 9. Illustration of procedure to approximate a response channel function, Uk, 
with a set of feature channel functions, ah, over an interval of sample points n 



their approximation in terms of similarly activated features ..., a^-i, a^, a^+i can 
be computed. The resulting optimization coefficients will constitute 

quantitative links in the linkage matrix C between the input feature side and the 
output response side. 

Taken over all response nodes, k, this is written in matrix terms as: 



u„ = Ca„ (23) 

For the entire training set of vectors u„ and a„ this is written in matrix form 
as before: 



U = CA (24) 

Having somehow completed a training procedure for the entire range of values 
of u, we can change the switch to output, from the training position t, in Figure 8. 
After this we can present an unknown pattern with feature vector a, within the 
definition range as input to the system, after which the system will use the 
linkage matrix derived, C, to compute the actual value of u. 



u = Ca (25) 

When the training is completed, the computation of the preceding expression 
for an unknown vector a is extremely fast, due to the sparsity of the vectors and 
matrices involved. 
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9 Properties of the Linkage Matrix C 

We have related the set of response states U and the corresponding percept 
vectors A with the matrix equation 



U = CA (26) 

This matrix equation does not generally have a unique solution, but it can 
be underdetermined or overdetermined. 

For the system to perform as desired, we require a solution with some par- 
ticular properties: 

1. The coefficients of matrix C shall be non-negative, as this gives a more sparse 
matrix and a more robust system. A traditional unrestricted least squares 
solution tends to give a full matrix with negative and positive coefficients, 
which do their best to push and pull the basis functions to minimize the 
error for the particular training set. A particular output may in this case be 
given by the difference between two large coefficients operating upon a small 
feature function, which leads to a high noise sensitivity. 

2. The coefficients of matrix C shall be limited in magnitude, as this as well 
gives a more robust system. One of the ways to achieve this is to set elements 
of A below a certain threshold value to zero. This is related to the lower 
treshold part of the S curve, often assumed for the transfer function of real 
and artificial neurons. 

3. Matrices A and U are sparse, and the entire system can be dealt with using 
fast and efficient procedures for solution of sparse systems of equations for 
values between two limits. 

4. Coefficients of matrix C which are below a certain threshold shall be elim- 
inated altogether, as this gives a matrix with lower density, which allows a 
faster processing using sparse matrix procedures. If desired, a re-optimization 
can be performed using this restricted set of coefficients. 

After the linkage matrix C has been computed, we can obtain the response 
state u as a function of a particular feature vector a as 



u = Ca 



(27) 



9.1 Multiple Response State Variables 

So far we have only discussed the situation for a single response or state vari- 
able u. We will normally have a number of state variables u,v,w,t , .... As we 
change the value of the additional variable v, however, the set of features which 
is involved for a particular value of u will vary, and we can suspect that different 
models would be required. This is true in general, but if feature vectors exhibit a 
sufficiently high degree of locality, the simple model structure proposed will still 
work. In such a case, the solution for three response variables in matrix terms 
can be expressed as 
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r U = C“A 
I V = C^'AW = C^A 



(28) 



The reason why this works is again the extreme degree of locality of feature 
components. This means that as v varies, new feature components move into 
the mapping and old components move out transparently for the single model 
available. Due to the beauty of the channel representation, this means that 
channels which are not active will not disturb the matching process for those 
who are active. 



10 Applications of Associative Structure 

As the purpose of this paper is to give a description of the principles of the 
associative channel structure, we will in this context only give a few comments 
on results from applications. 

The structure has with great success been used to estimate various properties 
in an image, ranging from description of line segments to structures containing 
corners or in general curvature. There are various approaches which can be used. 

The structure has also been used for a view-centered object description pro- 
cedure, which is able to recognize the object car from several different angles 
and also give an estimate of the view angle. 

As in any other descriptive system, the mapping of certain properties are in- 
variant, while others are not. In the associative procedure, the system will detect 
such properties by itself, or it can be guided in the choice of such properties. 

11 Coucludiug Reuiarks 

Learning in any robot system or biological system does not take place in parallel 
over a field of features and responses. Learning takes place along one-dimensional 
trajectories in a response state space. The reason for this is that a system, like 
a human or a robot, can only be at “one place at a time” . As it moves from one 
place to another, which really implies from one state to another, it can only do 
so continuously due to its mass and limited power resources. Consequently, the 
system will move along a one-dimensional, continuous trajectory in a multidi- 
mensional space. This continuity is one of the few hard facts about its world, 
that the system has to its disposal to bring order into its perception of it, and 
it has to make the best possible use of it. 
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Abstract. We consider the structure of colorimetry, essentially of Grafi- 
mann’s threedimensional linear space that summarized metamery (con- 
fusion of spectral distribntions) for the human observer. We show that 
the dehnition of an orthonormal basis for this space requires a scalar 
product in both the space of physical beams, and the representation 
space on which Grafimann’s manifold is mapped. The former of these 
scalar products has to be constructed on the basis of considerations of 
physics and physiology. The present standards (GIE) are very awkward. 
The latter of these scalar products can be choosen for reasons of conve- 
nience. After these choices “color space” becomes a “true image” of the 
space of physical beams, apart from the fact that all but three of the in- 
hnitely many dimensions are lost (the metamery) . We show that the key 
operator of modern colorimetry, Gohen’s “Matrix-R” (the projector on 
fundamental space that rejects the “metameric black” part of arbitrary 
physical beams) also requires these scalar products for its definition. In 
the literature such inner products are (implicitly) assumed with the un- 
fortunate result that the awkward CIE definition is willy nilly accepted 
as the only possibility. 



1 Introduction 

Historically, “colorimetry” is one of the earliest success stories of visual psy- 
chophysics. Following Newton [8], most of the structure was in place by the 
mid 19th c., the formal structure being due to Grafimann[4], the empirical and 
methodological work to Maxwell[7]. The field was polished off in the 1920’s by 
Schr6dinger[9] after decisive work by Helmholtz [5]. The only nameworthy devel- 
opment after this is due to Cohen[3] by the 1970’s. 

As the field is presented in the standard texts it is somewhat of a chamber of 
horrors: Colorimetry proper is hardly distinguished from a large number of elab- 
orations (involving the notion of “luminance” and of absolute color judgments 
for instance) and treatments are dominated by virtually ad hoc definitions (full 
of magical numbers and arbitrarily fitted functions). I know of no text where 
the essential structure is presented in a clean fashion. Perhaps the best textbook 
to obtain a notion of colorimetry is still Bouma’s[l] of the late 1940’s, whereas 
the full impact of all sorts of vain ornamentation can be felt from a book like 
Wyszecki and Stiles [10]. 

G. Sommer and Y. Y. Zeevi (Eds.): AFPAC 2000, LNCS 1888, pp. 69—77, 2000. 

(c) Springer- Verlag Berlin Heidelberg 2000 



70 



Jan J. Koenderink and Andrea J. van Doom 



2 The Colorimetric Paradigm 

The basic facts are simple enough. If you look into a beam of radiation with 
radiant power in the range 400-700 nm you experience a patch of light. The 
apparent shape and size of the patch depend on beam geometry; the color on 
its spectrometric properties. I consider only beams of incoherent radiation. Such 
beams can be added and multiplied with non-negative factors through simple 
physical techniques. The “space of beams” § (say) is thus the non-negative part 
of a linear space. When two beams a and b yield patches that cannot be dis- 
tinguished I will write the fact as a b. Such “colorimetric equivalence” can 
be objectively established. Notice that the observer is not even required to ven- 
ture an opinion as to the “color of the patch” . What makes this all interesting 
is that colorimetric equivalence does by no means imply radiometric identity. 
Of course two radiometrically identical beams are (indeed, trivially) colorimetri- 
cally equivalent. But most colorimetrically equivalent beams (drawn at random 
say) are unlikely to be radiometrically identical. This is the basic phenomenon 
of metamerism. The metamer of any beam a is the set of all its colorimetri- 
cally equivalent mates. Colorimetry is the science of metamerism, its aim is to 
parcellate the space of beams into distinct metamers. 

The essential empirical facts are formalized as “GraBmann’s Laws”. These 
are idealizations of empirical observations: a b implies /ra ph for any 
(non-negative) scalar fx, a b implies (a + c) (b + c) for any beam c, 
equivalent beams may be substituted for each other in any combination. A null 
beam exists (total darkness) which doesn’t change any beam when you add it to 
it. Apart from these generic properties there is the particular fact that no more 
than three beams suffice to produce equivalent patches for all others. This is the 
human condition of trichromacy. 



3 Geometrical Interpretation 

3.1 Gauging the Spectrum 

Maxwell was the first to “gauge the spectrum” . The idea is that the Newtonian 
spectrum yields an exhaustive radiometric description of beams. Since GraB- 
mann’s Laws imply linearity it is sufficient to investigate the spectral components 
(monochromatic beams), then all other beams are treated as linear combinations 
of these. 

Pick a set of three independent beams (the “primaries”) {pi,P 2 ,P 3 } (no 
linear combination equivalent to the null beam). Almost any random triple will 
suffice. Denote the monochromatic beams of a given, fixed radiant power as 
m(A) (A the wavelength). Then “gauging the spectrum” consists of finding three 
“color matching functions” Oi(A) {i = 1,2,3) such that (ai(A)pi -I- 02(A)p2 -k 
03(A)p 3) m(A). This is done in a wavelength by wavelength fashion. Notice 

that the color matching functions are not (necessarily) non-negative throughout: 
GraBmann’s Laws enable you to handle that because formally a — b c 
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implies a b + c. In practice one samples the spectral range at about a 
hundred locations. 

Consider any beam s given by the radiant spectral density s(A). Then (cipi + 
C2P2 + C3P3) s, where Ci = J ai{X)s{X)dX. This is what “gauging the spec- 
trum” buys you. The coefficients Ci are the “color coordinates” of the patch 
caused by the beam s. They indicate a point c (the “color” of the beam s) in 
three dimensional “color space” C. 

Of course the “color” changes when you swap primaries. You easily show 
that the new coordinates are a linear transformation of the old ones, the trans- 
formation being determined by the new pair of primaries and the (old) color 
matching functions (simply express the color of the new primaries in terms of 
the old system). This is the reason why textbooks tell you that “color space is 
only affine” and often make a show of plotting color coordinates on differently 
scaled oblique axes. 

3.2 The Structure of Colorimetry 

Notice that Ci = f ai(X)s(X)dX defines the color coordinates in terms of lin- 
ear transformations (xi,s) of the radiant power spectrum of the beam. The Xi 
are elements of the dual space S* of the space of beams § (S* is the space 
of linear functionals on S) and (•,•) denotes the contraction of an element on 
a dual element. Thus ((xi,s)pi -|- (x2,s)p2 -I- (X3,s)p3) s. The relation 
{Pi!P27P3} ^ {xijX 2 jX 3} is empirically determined through the gauging of 
the spectrum. 

One fruitful way to understand the geometrical structure of colorimetry is 
due to Wyszecki and (especially) to Cohen [3]. Think of the space of beams 
as the direct sum of a “fundamental space” F and a “black space” B, thus 
S = F -I- B. All elements of the black space are literally invisible, thus causally 
ineffective. Elements of fundamental space are causally maximally effective, that 
is to say, colorimetric equivalence implies physical identity. Any beam s (say) 
can be written as s = f -|- b and the metamer of s is f -I- B, i.e., let the black 
component range over the full black space. Fundamental space has to be three 
dimensional and is isomorphic with color space C. 

Apply this to the primaries, i.e., Pi = fi -I- bi (say). Clearly the black com- 
ponents are irrelevant: You may replace any primary with one of its metameric 
mates, it makes no difference. The fundamental components fi are what matters, 
they form a basis for color space. Of course you know only the pi, not the fi 
though. By picking the primaries you automatically select a basis for F, only 
you don’t know it! 

3.3 Metric in the Space of Beams 

In order to proceed you need additional structure, most urgently a metric or a 
scalar product in the space of beams. In many treatments (even Cohen’s other- 
wise exemplary work) the existence of a scalar product on § is simply taken for 
granted, yet there are quite a few problems involved. Without a scalar product 
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in both S and C you cannot introduce the transpose of a linear map (frequently 
done in the literature without so much as the drop of a hat) for instance. 

A metric for the space of beams presupposes a decision on how to represent 
beams. The conventional way is through radiant power spectral density as a func- 
tion of wavelength. Yet wavelength is irrelevant (should one take the vacuum 
wavelength or that in the acquous medium of the retina?) and photoreceptors 
count photon absorptions rather than integrate radiant power. For the interac- 
tion with photopigment it is the photon energy that is relevant. Once absorbed, 
the effect of any photon is like that of any other (i. e., after absorption photon 
energy is irrelevant, the “Law of Univariance”). Thus only a spectral description 
in terms of photon number density as a function of photon energy makes any 
physiological sense. 

Photon energy is still an awkward scale: Consider the notion of a “uniform 
spectrum” . If you take constant spectral photon number flux distribution the 
result will depend on the particular unit (should you use electronVolts or ergs?). 
Clearly the notion of a “uniform spectrum” should not depend on such accidental 
choices! The only way to rid yourself of this problem is to take the uniform 
spectrum on a logarithmic photon energy scale. The spectral density de/e is 
invariant against changes of the units. The essential arguments are especially 
well laid out by Jaynes [6]. 

Thus I will write the basic colorimetric equations as Ci = s{e)gi{e) de/e, 
where s(e) denotes the spectral photon number flux density and the gi are (new!) 
color matching functions. 

With the representation in place you still need to define a scalar product. 
This again is a knotty problem. Various choices appear reasonable. The choice 
should be made according to the aspects of the relevant physics. Here I will settle 
on a • b = a{e)b{e) de/e. 



3.4 Again: The Structure of Colorimetry 

With the scalar product in the space of beams in place it is possible to venture 
some further advances. For instance, you can replace the basis of dual space S* 
with a “dual basis” of S in the classical sense. Notice that to any dual vector a 
(say) corresponds a unique vector s (say) when you require that (u, x) = s • x 
for any vector x. Let the vectors corresponding to the Xi be denoted gi. Then 
{giig2,g3} is the dual basis of {fi,f2,f3} in fundamental space F, that is to 
say, s = (gi • s) fi -f (g2 • s) f2 -I- (gs • s) fa. (We also have the dual relation s = 
(fi • s) gi -k (f2 • s) g2 -k (fa • s) g3.) 

Let me write the dual basis as G = {gi, g2, ga}) *-e., a matrix with columns 
equal to the color matching functions. Its Grammian (matrix with coefficients 
Gij = gi • gj) is G"^G. It is a 3x3 symmetric, nonsingular matrix. You can use 
it to construct the basis of fundamental space: = (G’^G)”^G’^, where F = 

{fi,f2,f3}. Thus the introduction of the scalar product has enabled us to find 
the basis of fundamental space induced by the (arbitrarily chosen!) primaries. 
This is real progress. 
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Notice that gi • s = s(e)gi(e) deje, that is to say, the coordinates of the 
dual basis vectors of fundamental space are the color matching functions. “Du- 
ality” is expressed by the relations (F'^F)“^ = G'^G and G'^F = F"^G = I3. 
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Fig. 1. In the upper row I show the spectra of two quite different sets of pri- 
maries. One set consists of monochromatic beams, the other of band pass spectra. 
In the middle row I plot the F basis, in the bottom row the corresponding G 
bases 



Finally, f = F G"^s, for an arbitrary beam s with fundamental component f . 
Notice that thus f = G(G'^G)-iG'^s. The matrix Pp = G(GTG)-^GT is 
symmetrical, has rank 3, trace equal to 3 and satisfies Pp = Pp- It is the pro- 
jection operator in § on fundamental space F. Because of that it doesn’t depend 
on the arbitrary choice of the primaries. You may easily check this explicitly by 
changing to another set of primaries. It is an invariant, complete description, the 
“holy grail” of colorimetry! 

The projection operator is Cohen’s “Matrix-R” . From the present derivation 
its geometrical nature is immediately clear: Pp = FG'^ = GF'^. In plain words: 
The fundamental spectra are simply linear combinations of the fundamental spec- 
tra of the primaries, the coefficients being the coordinates of the color. 
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I show examples in figures 1 and 2. In figure 1 I consider two quite different 
sets of primaries and find the dual bases for each of them. Notice how these bases 
turn out to be quite different. However, when you calculate Cohen’s matrix R 
from either set of bases you obtain the identical result (figure 2). 




Fig. 2. A density plot of “Cohen’s Matrix R”. Either basis of figure 1 yields the 
identical result. The projector on fundamental space is the main invariant of 
colorimetry. Matrix elements are labelled by photon energy in eV 



4 True Images of Color Space 

The usual textbooks tend to (over-)stress the point that color space is “affine”, 
meaning that arbitrary linear transformations are irrelevant. Often they show 
figures in oblique, unequally divided axes to drive the point home. This is really 
counterproductive. “Color space” in the classical sense is an image of fundamen- 
tal space. Why not try to construct an image that is as “true” as possible? Once 
you have a metric in the space of beams that question makes sense. 

There are really two different issues here, one has to do with the structure of 
fundamental space, the other with “representation space” . By the latter I mean 
that an “image” is usually drawn on some canvas. For instance, Cartesian graph 
paper is usually preferred over arbitrary, oblique, unequally divided axes and 
rightly so. It is not different for color space which is an image of fundamental 
space: The most convenient “canvas” is obviously three dimensional Cartesian 
space referred to an orthonormal basis. Notice that there is nothing “deep” going 
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on here. Yet the point (though never mentioned) is important enough, because it 
means you already have a scalar product in color space, namely by choice. Since 
you already have a scalar product in fundamental space (induced by the scalar 
product in the space of beams), the notion of “true image” is well defined. An 
orthonormal basis in fundamental space should map on an orthonormal basis in 
representation space. In the conventional representation (the CIE convention [2]) 
this is far from being the case though: the ratios of the lengths of the basis vectors 
are 1:0.97:0.43, whereas they subtend angles of 142°, 106° and 82°. Hardly a 
pleasant basis! 

The standard way to construct a true image is to use the singular values 
decomposition (SVD) of the linear map, in this case of G. Thus I write G = 
V'^WU. The matrix U is the desired orthonormal basis of fundamental space, 
thus the problem is solved. The matrix W is a diagonal matrix containing the 
“singular values”. There exist three nonvanishing singular values. Due to the 
arbitrary choice of primaries the singular values are unlikely to be equal, thus 
revealing the “distortion” of the original representation. (In the CIE convention 
the singular values are in the ratios 1:0.81:0.29.) The matrix V’^ is just an 
isometry in representation space, thus it doesn’t spoil the “truthfulness” of the 
image. 

In the “true image” a unit hypersphere in the space of beams maps upon 
a three dimensional unit sphere in color space. Infinitely many dimensions are 
simply lost in the image, but at least the ones that are preserved are represented 
truthfully. I illustrate this in figure 3. Images in the color spaces for the two sets 
of primaries introduced in figure 1 of a (infinitely dimensional!) hypersphere in 
the space of beams turn out to be quite different triaxial ellipsoids. When I find 
canonical bases for either case (straight SVD) these turn into spheres that differ 
only by a rotation in (undeformed) color space. Of course arbitrary rotations 
don’t deform color space, thus you may pick a convenient orientation according 
to some idiosyncratic criterion. 

5 Conclusion 

Although the linear structure was essentially understood by Maxwell [7] and 
Grafimann[4] around the 1850’s, a geometrical interpretation had to wait till 
Cohen’s[3] work in the 1970’s. Cohen’s seminal work has yet to be absorbed 
into the textbooks. Current texts rarely approach the level of sophistication of 
Schr6dinger[9] who wrote in the 1920’s. 

What is lacking even in Cohen’s treatment of colorimetry is a clear geomet- 
rical picture. For instance, the fundamental invariant of colorimetry “Cohen’s 
Matrix-R” cannot be defined without a scalar product in the space of beams. 
Cohen — implicitly — used the Euclidian scalar product in the space of radiant 
power spectra on wavelength basis. The point of our discussion is that one has 
to make an explicit choice here. The choice makes a difference, because a space 
of spectral photon number density on photon energy basis (for example) is not 
linearly related to the conventional radiometric choice. 
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Fig. 3. Consider a hyperspere in the space of beams. In the top row I have plotted 
its image in the color spaces of the two bases introduced in figure 1. Notice that 
they are different and hardly spherical. Both color spaces yield deformed (thus 
misleading) images of fundamental space. In the bottom row I show the same 
configuration in the canonical color spaces obtained by straight SVD. These differ 
only by an isometry (rotation about the origin of color space). Both are “true 
images” of fundamental space, they merely show views from different directions 



That “color space” also presupposes a “representation space” with a conven- 
tional scalar product {e.g., the standard Euclidian one) is a never acknowledged, 
but necessary element of Cohen’s analysis. For instance, the conventional “color 
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matching matrix” G is a matrix with the color matching functions as columns. 
The color c of a beam s (say) is c = G'^s. Thus G'^ occurs as the map from 
S to C. In Cohen’s Matrix-R the transpose of this map (G) also occurs. It is a 
map from C to S. Although Cohen defines the transpose via swapping of rows 
and columns of the matrix associated with the map, the geometrical definition 
is via X • G"^y = Gx • y. Here x G C and y G §, thus the first scalar product 
is taken in C, the second in S. Thus one needs to settle on scalar products in 
both spaces. One choice has to be decided on conventional grounds (essentially 
convenience), the other on conceptual grounds (physics and physiology). But the 
choices have to be made explicitly. 
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Abstract. We propose a novel method to calculate invariants of color 
and multicolor nD images. It employs an idea of multidimensional hy- 
percomplex numbers and combines it with the idea of Fourier-Clifford- 
Galois Number Theoretical Transforms over hypercomplex algebras, 
which reduces the computational complexity of a global recognition algo- 
rithm from 0{knN'^^^) to 0{kN'^ log A) for nD fc-multispectral images. 
From this point of view the visual cortex of a primates brain can by con- 
sidered as a ” Fast Clifford algebra quantum computer” . 



1 Introduction 

The moment invariants have found wide application in pattern recognition since 
they were proposed by Hu [1]. Traditionally, the moment invariants have been 
widely used in pattern recognition application to describe the geometrical shapes 
of the different objects. These invariants represent fundamental geometrical 
properties {e.g., area, centroid, moment of inertia, skewness, kurtoses) of geomet- 
rical distortion images. Low-order moments are related to the global properties 
of the image and are commonly used to determine the position, orientation, and 
scale of the image. Higher-order moments contain information about image de- 
tails. The image invariants are constructed in the two steps [l]-[9]. At the first 
step two-indices moments of the image /(*i, . . . , in) are computed by the form: 

N-l N-1 
n— 0 2 n— 0 

where pi, . . . ,pn = 0, 1, . . . , A — 1. At the second step invariants .f?T-(pi,....p„){/} 
are computed as spectral coefficients of the Kravtchook transform of mo- 
ments [10]. There are two drawbacks in this algorithm. 

— First, fast calculation procedures of moments are not known today. The 
direct computation of the moments is too expensive in computation load. 
For example, for a nD window LF"(A) containing A" grey-level pixels the 
computational complexity is 0(nA"+^) of multiplications and additions. 
In many cases, however, especially in real-time industrial applications the 
computation speed is often the main limitation. 
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In the literature, authors usually discuss fast computations for the moment 
invariants of binary images (see review [11] and [12]). The authors usually use 
Green’s theorem to transform the double integrals in the 2-D domain into 
the curve integral along the boundary of this binary domain. The computational 
complexity in this case is reduced to 0{N). However, for the grey-level image 
intensity function such algorithms are not known. 

— Secondly, it is not possible to estimate the moments of high order with fixed 
point arithmetics. These moments and their invariants have very large dy- 
namic range. Such moments and invariants corresponding to them are said to 
be (in terms of quantum mechanics) un-observed. Hence, a computer with 
fixed point arithmetics can not observe small details of image. Moreover, 
these moments are sensitive to noise. The number of the observed moments 
and invariants can be increased using floating-point arithmetics. However, in 
this case approximation errors may become the value of the high-order mo- 
ments. This means that the high-order moment invariants are not a ’’good” 
image features. 

The second drawback is that it is impossible to remove for the computers 
with the finite capacity. This means that we have to use the other arithmetics 
to observe the high-order moments. 

We propose to use modular arithmetic of the Galois field to develop a fast 
calculating algorithm for the low- and high-degree moments and invariants. 
Here a notion of modular invariant is introduced for the first time. The new 
moments exhibit some useful properties. First, dynamic range is the same for all 
moments. It can help us to overcome the diminishing problem of higher-order 
moments which occurs when other moment invariants are used. We can use 
the Fourier-Glifford-Galois Number Theoretical Transforms (FGG-NTTs) for 
the fast evaluation of low- and higher-order moment invariants which reduces 
the computational complexity of the nD grey-level, color and multicolor images 
recognition. Modular invariant hypercomplex-valued multicolor images can be 
calculated on the quantum computer working in modular Z/Q-arithmetic. 



2 Fast Calculation Algorithms of Real— Valued Invariants 
Based on Fourier— Galois NTTs 

2.1 Modular Images and Moments 

Moments and invariants are calculated based on the image models. The classical 
image model is a function defined on a window W"(iV) = [0, N— 1]" with values 
either in the real numbers field or in the complex numbers field. When digital 
computers appeared it became perfectly clear that the result of any calculation 
can be only a rational number or an integer. Hence, it can be considered (up to 
constant multiplier 2™, where m is the capacity of an analog-to-digital converter 
(ADG)) that an image has their values in the ring of integers: /(fi,i 2 , . . . ,i„) : 
W^{N) — > Z. If an image /(*i, i 2 , ■ ■ ■ ,in) has 2™ gray-levels then there are no 
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principal limits to consider an image mathematical model as a function having 
their values in the finite ring Z/Q : i.e. as /(ii,i 2 , ■•■,*«) : W^{N) — > Z/Q, 
if Q > 2™. This model can also give possibility to operate with pixels of image 
according to Z/Q-arithmetic laws. The Z/Q-valued function is called modular 
image. They form the space L(W'^{N),Z/Q). 

If 2D images are processed on computer then intermediate and final results 
are expressed as integers on fixed point computers. This allows us to introduce 
new moments and invariants. 

Definition 1 [13]-[21]. Functionals ^(pi,P 2 ,...,Pn) {^T^odQ) = 

N-l N-l 

= ^ . . . ^ /(*!, 12, . . . , •••*«" (mod Q) (2) 

* 1—0 0 

and := dfi(pi,p 2 .....pn) (modQ) are called modular moments and ab- 

solute modular G-invariants , respectively. 

Note that if Q is a prime then according to Euler’s theorem, Eq. i^~^ = 1 (mod Q) 
holds for every element i G GF(Q) [22]. Hence (j^odQ), 

..., (modQ) are true for all = 

0,1, ...,Q — 2 and for all ri,...,r„ = 0,1,... . Therefore, the ma- 
trix = [m'pi,p 2 ,...,pn] (jnodQ) is the periodical matrix: 

*^(pi+^"i(Q— i),P 2 +^" 2 (Q— i),...,pn+T’n(Q— 1) (pi ,P 2 , • • . ) ' Thc fundamental period 
is nD matrix M.{pi,p 2 ,...,pn.)- For n = 2 we have 



[■^(p,q)\ — 


Afo,o 

Afi,o 


Afo.i • 

Mi,i . 


• A1o,Q-2 

• Mi^Q-2 





Q-1 Q-1 

E E 




_Mq-2,0 


AIq-2,1 • 


■ Mq-2,Q-2_ 




o 

II 

O 

•11 



The matrix elements -^(pi,p 2 ,...,p„) of tFe matrix [M.{pi,p 2 ,...,p„)] are Fourier- 
Mellin-Galois spectral coefficients [23]-[24j. 

2.2 Fast Calculation Algorithms of Modular Invariants Based on 
NTTs over Galois Fields 

As we have seen all calculations in Eq. (2) can be realized according to the rules 
of GF(Q)-arithmetics if Q is prime number. Let e be a primitive root in the 
field GF(Q). Its different powers 1 = £^=1, 2 = £'=^ 3 = ..., Q - 1 = 

cover the field GF(Q) for proper fcp = 0, fci, /c 2 , fcg- 2 - If a = £^, then k is 
called the index of a in the base e and is denoted by fc = indeO. Indexes play 
the same role in the field GF(Q) as logarithms in the field of the real numbers. 

Theorem 1 Additive (Ad) and multiplicative (Mu) computer complexities of 
modular moments are equal to 



Mu = Ad = 



G(nA”log2 A), if A = Q-1 
G(nA"log2 A), if A«Q, 



if £ yf 2, 



(3) 
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Mu = 0, 



rO(niVMog2 N), 
\0{nN^l0glN), 



if N=Q-1, 
if N « Q, 



if e = 2. 



( 4 ) 



Proof: 1. For N = Q — 1 let us represent in Eq. (2) in the form ii := 

£- 1 , Z 2 := £- 2 , . . . , in '■= £-", where ii := indeii, i-i := indei 2 , . . . in '■= indein- 
Then 

Q-2 Q-2 Q-2 

= E E • ■ • E = 



= ^ /(i)£<Pl^ (modQ), (5) 

igVyn(7V) 

where f(i) = f(ii,i 2 ,---dn) ■= i := (li, ■•■,*„)* = Ii), and 

p := {pi,P2, ---.Pn) = (p|, (p|i) := iiPi + I2P2 + ■ ■ ■ + inPn- We obtain new 
calculating algorithm for the modular moments -^(pi,p 2 ,...,p„) as the nD Fourier- 
Galois Number Theoretical Transform (FG-NTT). Its computational complexity 
is defined complexity of the fast nD FG-NTT: O (niV" log 2 iV) additions and 
multiplications [23]. We will denote this algorithm as follows: 

~ Algi (fG-NTT„, e,N = Q-l, GF log^ iV)) . 

Gomputational complexity of the new algorithm can be reduced by special choice 
of the primitive root £. Indeed, if £ = 2 then Eq. (5) is reduced to the nD FG- 
NTT, which is fulfilled without multiplication. Gomputational complexity of 
such computational scheme is only 0{nN^ log 2 N) additions. We will denote 
this algorithm as follows: 

- Alg'i (fG-NTT„, 2, a = Q - 1, GF(Q), OAd(^^” log2 ^), o) ■ 

2. If N « Q then Eq. (5) is nD Vandermonde-Galois transform: 



N-IN-1 N-1 

M(,^,P.,..;P.) = E E ••• E (modQ), 

( 6 ) 

Gomputational complexity of the nD Vandermonde-Galois Number Theoretical 
Transform (VG-NTT) is 0{nN'^\o^N) additions and multiplications [25], if 
£ yf 2 and 0{nN'^ log 2 N) additions if £ = 2. We obtain now two versions of the 
second algorithm and will denote them as follows: 

- Alg2(vG-NTT„;£; A « Q; GF(Q); OAdMu(^^” log' ^)) > 

- Alg; (VG-NTT„; 2; A « Q; GF(Q); {nN^ log^ A); o) . □ 
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2.3 Fast Calculation Algorithms of Modular Invariants Based on 
the Discrete Radon Transform 



Let {p°} G W^{N) be minimal vector set such that all rays ap°, a = 
0, 1 , N — 1 cover the whole window W'^{N). Then we can write that 



Mp = M^apO)= 

ieW"(N) 



E ( E /(>)] 

P^o \<p°|i>=p / 






or 

N-l 

^iapo) = '^ where 7(p°,p) := 7^X>„{/(i)} = ^ /(i). (7) 

P=0 <p°| i>=p 



Definition 2 [26]-[28]. The function /(p°,p) which is equal to the sum of 

values of the signal /(i) on the discrete hyperplane (p°|i) = p is called Discrete 
Radon Transform (DRT) of /(i). 

The expression (7) means that nD NTT is a composition of DRT TZDn and 
a set of ID NTTs. The total number of ID NTTs is equal to the power of the 
set {P°}- Every ID NTT acts along the ray ap° . It is necessary to find such the 
set {P°} that would give DRT with minimum computational complexity. Note 
that the classical ”rown/column separable” n-D NTT is reduced to ID 

NTT’s. 



Theorem 2 



[29]-[32]. The total number of ID NTTs in Eq. (7) is equal to 

if A = g is prime integer, 

HN = q^, 



9-1 
-1 



k m(n-l) 

n 



2=1 



The total computational complexity of the proposed algorithm for nD FG-NNT^ 
are O (FG-NTT„ ) = 0{TZn)+N^~^ C> (FG-NTT i ) instead of nN^~^ O (FG-NTTi ) 
for classical fast ”rown/column separable” nD FG-NNT. As result we obtain 
the following total additive O js^^{nN^ log 2 ) and multiplicative log 2 ) 

complexities for nD FG-NNT. We will denote this algorithm as follows: 

Algg (dRT„, FG-NTTi, £, Q- 1, GF {Q),Oj^^{nN'^ log^N) , log^N)) . 

Therefore additive computer complexities of the present algorithm and algo- 
rithms Algj are equivalent, but multiplicative computer complexity of the new 
algorithm is in n times smaller. 
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2.4 Fast Calculation Algorithms of Modular Invariants Based on 
the NTTs over Direct Sum of Galois Fields 

In algorithm, based on the FG-NTT on the modulo Q imposes limitations win- 
dow size W^{N). In this case powers if (modQ) cover the spatial 

window W'^{N) (exclusively 0, null column and null line). Rigid dependence be- 
tween N and Q {N = Q — 1 ~ 2™) restricts the searching of N only to one value: 
N = Q — 1. Let us show that this limitation can be removed via the Chinese 
Reminder Theorem (CRT). 

Let be a set of k prime integers such that min(Qi, ..., Q^) > N 

and Qs = Qi ■ ■ ■ Qk- Then we can imbed the IT” (A) into k windows 

Vt/"(iV)^tTi”(Qi) = [0,Qi-l]”, W^{N)^W^{Qk) = [0,Qk-ir, (8) 

and process the images separately in to windows by modulo Qi, . . . , Qk ■ 

■■ ITf(Qi) — . GF(Qi), 



fk{ti,i2,-.-,tn) : W^iQk) ~^GF{Qk). 

This is equivalent to the image processing into one window by a ’’big” mod- 
ulo Qs = Q 1 Q 2 ---Qk, i-e. as /(zi, 12 , ■ • ■ , *n) : W^{N) — > Z/Qs- 

According to the CRT the moments At can be calculated in these k 

windows using k GF(Q)-arithmetics: 



Qi -2 Qi -2 

^-^(Pll.Pl 2 .....Pln) = X] X (modQi), 

ill—0 iln—^ 



(9) 



Qfc-2 Qfc-2 



■^(Pkl,Pk2,---,Pkn) ~ X! X! *fc2 • ■ • /fe(*fel, */c2) ■ • ■ ) *fcn) (uiodQfc), 

ikl=0 ikn=0 

where fi{in,ii 2 , ■ ■ ■ ,iin) = /(*i, * 2 , ■ • ■ , *n) {inodQi), VZ = l,2,...,fc and 

zf“ = zf^ (modQi), {u\odQi), ..., {u\odQi), VZ = 1, 2, ..., fc. 

Let £i,...,£k by primitive roots in the Galois fields GF(Qi), ..., GF((5fc), re- 
spectively. If we substitute expressions 

*11 := £~i ^ *12 := £~i j *in := *2i := £2^5 *22 := £2^ j *2n := , 



' — £/f 7 ^fc2 ' — £/if ^ ^kn ■ — £/L 



into Eq. (9) we obtain k NTTs: 
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Qi-2 Qi-2 

= E - E 

i^^—0 h„—0 



■ ■ ■ din) (modQi), 



( 10 ) 






(Pkl,---,Pkn) 



■■■ (modQk), 

ilci i-t-Ti 



acting in the k windows (8). 

It is not difficult to see that the computational complexity of such a scheme 
(using the FG-NTT [33]-[36]) is Y21=i ‘^Qi Qi ~ 2fciV^ log 2 N additions and 
multiplications. If = 2, Vi = l,2,...,fc then computational complexity is 
reduced to 2fciV^log2iV additions. We obtain two new algorithms: 



- Alg4(FG-NTT2; {edti, N; {GF(Qi)}t4; OAdMu(2fc^" log2 N )) , 

~ Alg;(FG-NTT2;{£, = 2}ti;iV;{GF(Q,)}ti;OAd(2fciV2log2iV); o) . 



2.5 Fast Galculation Algorithms of Invariant Gorrelation Function 
Based on NTTs over Galois Fields 

Using the nD matrix of absolute invariants In{/} := [I^?T'(pi,p 2 .....pn){/}] 
construct generalized autocorrelation function CORf{x\,X 2 , ■■■, x„) of the image 
f{xi,X 2 ,---,Xn) as follows: 



OO OO 

CORf{xi,X 2 , ...,x„) = E-E IlT'{pi,P2,...,Pn ){/}epi (xi)cp2 (x2)...ep„ (xn), 

Pi— 0 Pn— 0 



or in the matrix form by 

CORf {XiX2, ■■■,Xn) = [xf ]"^ 0 0 ... 0 [x^"]"M^»^(pi.P 2 .....P„){/}]: 

where [ep^]{xi) := i = 1,2, ... ,n are the inverse Vandermonde Matrix 

Transforms. Note that all samples of autocorrelation functions have the same 
dynamic range whereas the moments (and the moment invariants) have different 
ranges. Thus information about the image represented in the moment invariants 
is not of equal value, but in the autocorrelation function it is represented by 
equal value. 

For the modular autocorrelation function we have 

N-l) Q-1 

CC>7^/(^l,*2, ■ • ■ Un) = ^ E 4pi.P2.....Pn)U (modQi;). 

Pl=0 Pn=0 
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Let us represent , . . . , in the last equation in the form ii := e-i , i2 := £-2 , 
. . . , in ■= £-" , where := indeii, I2 := indei2, . . . in '■= indgin- Then 

N-l N-1 

C07^/(^l,^2,...,^„) = E - E (modQi:). 

pi—0 Pn—0 

We obtain two calculation algorithms of invariant correlation function 

- Algg (fG-NTT„; e;N«Q; Z/Q^; Oac 1 Mu( 2 ^^” iV)) , 

- Alg;(FG-NTT„; 2 ;iV«Q;Z/Q,;;OAd (2nA"log^A); o). 

and 

~ Algg (VG-NTT„; s; N « Q- Z/^; OAdMu( 2 ^^” log^ A)) , 

~ Alg;(vG-NTT„; 2 ;A«Q;Z/^;OAd ( 2 nA"log^A); o) . 

Let us consider limitations which have to be imposed on Qz: = Q1Q2 • • • Qk 
and N if FG-NTT is used for the correlation function computation. If compu- 
tations in modular arithmetic coincide with computations in the classical in- 
teger arithmetic then COTZf{ii,i2, ■■■,in) = CORf{i\,i2,---,in) < Qs, G 
[0, N — 1 ]^. If max f{ii,i2, in) = A, and 



7V-1 N-l 

\COR{H,i2,:;iu)\ < E - E < Qs, 

ii—0 in—0 

i.e. if Qs > N'^A^ or A < VQe/N, then COTZf{ii,i 2 , ■■■,in) = 

CORf (fi, ^2? iji) • 

For example, let n = 2, A = 64, and Q\ = 67, Q 2 = 71, Q 3 = 73, 
Q 4 = 79. Then A < \^Qs/N = -\/67ir7r3r73ir79/64 « 81. This means that 
COTZfih, ...,*„) = CORf{ii, ...,in) if 0 < /(ii,*2, ■■■An) < 81. 

3 Fast Calculation Algorithms of Complex— Valued 
Invariants Based on Fourier— Clifford— Gauss— Galois 
Transforms 

In this section our aim is to reduce the complexity of the two previous stages in 
calculating algorithm applying continuous complex arithmetics. We propose new 
modular complex- valued moments. They are relative invariants with respect to 
the wide class of the geometrical distortions. The use of complex arithmetics 
makes it un necessary to complete the second step (Kravtchook transform) in 
the global recognition algorithm. Therefore, new invariants can be measured 
directly from an image without the calculation of moments. Further more, we 
use modular Glifford-Gauss-Galois arithmetics [37]-[41] for the fast computation 
of the low- and higher-order complex-valued invariants. 
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3.1 Complex Moments and Invariants of 2D Images 

Let f{x,y) be a 2D grey-level image, where (x,y) G R^. This function can 
be considered on the generalized complex plane ^C2(R|1,I) : /(z) := f{x,y), 
where z = x + ly G 0C2(R|1,1), and / is the generalized imaginary units 
(/^ = — 1 , 0 ,+ 1 ). These 2D generalized complex numbers form the 2D gener- 
alized spatial complex Cayley-Klein algebra yl2(R|l,-^) spanned on two main 
elements 1 ,/, with P := S = — 1 , 0,1 [ 10 ]. In the first case {P = = — 1 ) 

2-D algebra forms the field of complex numbers, in second {P = = 0) - one 

algebra of dual numbers and in the third {P = = 1) case - algebra of dou- 

ble numbers that are denoted as ^2(R|1, f) := R -I- Rf, yl2(R|l, e) := R -I- Re, 
yl2(R|l,e) := R-l- Re, respectively. When one speaks about all three algebras 
simultaneously then it concerns algebra of generalized complex numbers, that is 
denoted as yl2(R|l,/) := R-l-R/. 

Let c G GC 2 he the centroid of image /(z). 

Definition 3 [2], [10]. Functionals of the form 

mp{/}= j (z-c)P/(z)dz, nipq{f}= j (z - c)p/(z) dz (11) 
Z^QC2 Z^QC2 

are called one- and two-index A 2 (R\^, I) -valued central fractional moments of 
the 2D image /(z), respectively, where p,q G Q is the rational numbers. 

If /(z) is the initial image then /v.w(z) = /(v(z -|- w)) := /(z*) denotes its 
geometrical distortion copy. Here v, w are fixed complex numbers. Summing w 
with z brings us to image translation by the vector w, multiplication z -|- w by 
V equivalent to the rotation of the vector z -1- w by angle ip (where ip = arg(v)) 
and to the dilatation by factor |v|. 

Theorem 3 [10]. Central moments of the image /(z) are relative yl2(R|l,/)- 
valued invariants 



m{/v.w} = vP|v|^m{/} = e^’^‘^|v|P+^mp{/}, (12) 

with respect to the small affine group aff(^C2(R|l, I)) with .42(R|1, 1)-valued 
multiplicators vP|v|^ = 

P + 2 

The following ratios rjp := mp/mp^ are called unary normalized moments. 
These moments are respective .A2(R|1, 1)-valued invariants with respect to the 
affine group aff(.42(R|l, /)) of the generalized complex plane ^C2(R|1,/) with 
yl2(R|l, /)-valued multiplicators because ?7p {/,p,|v|,w} ■ 

Let us calculate module of the left and right parts of the last equation: 
Idp {/<p,|v|,w} I = Idp {/} !■ A-S result we obtain the following theorem. 

Theorem 4 [10]. Modules of unary moments |?7p{/}| are absolute scalar-valued 
invariants Inp {aff(M2(R|l, /))] /} of the small affine group aff (M2(R|1, /)). 
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3.2 Fast Calculation Algorithms of A-2(R|1, /)— Valued Modular 
Invariants Based on NTTs over GF(Q^) Arithmetic 

Let us concentrate on the problem of the fast calculation of A2(R|1, 1)-valued 
moments: 



7V-1JV-1 7V-17V-1 

nip=Y^ '^{x+Iy)Pf{x,y), ncipg = ^ '^{x+IyY{x-Iy)‘>f{x,y). (13) 

x—0 y—0 x—0 y—0 

Here we can use modular complex arithmetics with different modules. We will 
consider two types of rings: Z//Q and Z//Q, where Q € Z or Q S Z/. 

Let, for example, I = i. In the ring Zi/Q = {a + ib \ a,b & GF(Q)} = 
GF((5) + iGF(Q) = GF(Q^) — 1 nonzero Clifford Gauss-integers (CG- 
integers) are contained. We have to work with the CG-integers from this ring 
as with ordinary complex integers and final results are calculated by modulo Q. 

Case 1 (1-index moments). Let e be the primitive root in the Galois field 
GF((5^), t = inde{x + iy), then {x + iy) = er*. Substituting the latter equation 
into Eq. (13) we obtain 



Q-lQ-l Q^-i 

Mp=^ '^{x + iyYf{x + iy)= ^ e*Pf{e*), {mod Q) (14) 

X—0 y—0 t—0 

that is ID {Q^ — l)-point Fourier-Clifford-Gauss-Galois NTT (FCGG- 
NTT) [23], [33]- [34]. 

Case 2 (2-index moments). For two-indexes moments Aipq a slightly 
changed computational scheme is also valid. In the Galois field GF(Q^) the 
following equation {x -I- iy)'^ = {x — iy) {mod Q) is holds. Hence, we can write 

Q-i Q-i 

Mpq = EE {x + f{x + iy) {mod Q). (15) 

X—0 y—0 

Let r = p + Qq = {p, q) be a 2-bit number written in Q-radix number system. 
Then 

Q-i Q-i 

Vf {p,q) — A4r = EE {x + iyY f{x + iy) {mod Q). (16) 

X—0 y—0 

Let e be a primitive root in GF(Q^). Then e* = x + iy = {x,y) in GF(Q^), 
where t = inds{x + iy). Substituting the later equation into Eq. (16) we obtain 



Q ^-2 

= Mr = ^ i'^od Q), r = 0, 1, . . . , - 1. (17) 

As the result computations of complex moments are reduced to {Q^ — l)-point 
FCGG-NTT [23], [35]- [36]. In both cases we get algorithms: 
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- Alg7(FCGG-NTTi;e;iV= VQ;GF(Q2);0^^j^^(2iVnog2iV)), 

- Alg^(FGGG-NTTi;2;A = VQ;GF(Q2);0^^(2iV2log2A); o) . 

Gomputational complexity of this algorithm is equal to log 2 A) 

(for Algy) if e yf 2, and Op^^{2N'^ log 2 N) (for Algy) if e = 2. 



3.3 Fast Galculation Algorithms of A-2(R|1, F)— Valued Modular 
Invariants Based on NTTs over Z//Q— Arithmetic 

Let us investigate the case when module is a complex number Q = A + iB, 
where (A, B) = 1. 

Theorem 5 (The Gauss theorem [23], [42]). Let Q = A + ii? and (A, B) = 1 
then the ring Zi/{A+iB) is isomorphic to the ring Z/|Q|, where |Q| = A^ + i?^ : 
Zil{A + iB) ~ Z/IQI and one-to-one correspondence is realized as a process of 
the complex GG-integers a + ib realification TZ{a + ib) — > h = a + pb, where 
p = {—A/B) (mod IQI) and as a process of the real integers h complexification 

C(h) =x + zy= (18) 

which gives the transition from h to x + iy, where dA|Q| := hA (mod |Q|) and 
hB\Q\=hB (mod IQI). 

The Gauss theorem means that the Z/Q- and Z/|Q [-arithmetic laws are a 
similar. For this case we have the following expression for modular moments 

Mp = EE {x + iyYf{x + iy) (mod Q), (19) 

(x+z2/)GZi/Q 

instead of Eq. (13). Let p be the Gauss coefficient. Realification of the left and 
right parts of Eq. (19) gives 

IQI-i 

n{Mp)= ^ {x + py)'Pf{x + py) = f{h) (mod|Q|), (20) 

(a:+py)GZ/|Q| h=0 

where f{h) := f{x + py) and h = x + py. Let complex modulo Q be selected 
so that IQI is a prime integer (for example, for Q = 1 + 16i we have |Q| = 
l2 + 102 = 257). Then Z/|Q| = GF(|Q|) is such Galois field. Let us select a 
primitive root e: in the Galois field. Let h = e— . Then 

N(|Q|-2 IQI-2 

n{Mp)= Y. (mod IQI) = ^ (mod|Q|) (21) 

h=0 h=0 

is ID (IQI — l)-point FG-NTT, where f(h) := f{s—). This case is reduced to the 
problem considered in the previous section and gives the following algorithms: 
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- Algg (fG-NTTi; e;N= Z*/Q; OAdMu(^' log 2 iV)) , 

~ Alg'(FG-NTTi;2;iV= y]^;ZVQ;OAd(^"log2^); o). 

Gomplex-valued moments are easily reconstructed on calculated values: Mp = 
C{TZ(A4p}} that describes the process of complexification of real-valued mo- 
ments. 



4 Fast Calculation Algorithms of Quaternion Invariants 
Based on Fourier— Clifford— Hamilton Transforms 

There is currently a considerable interest in methods of invariant 3-D image 
recognition. Indeed, very often information about 3-D objects can be obtained 
by computer tomographic reconstruction, 3-D magnetic resonance imaging, pas- 
sive 3-D sensors or active range finders. Due to that algorithms of systematic 
derivation of 3-D moment invariants should be developed for 3-D object recog- 
nition. 

In this section we proposed an elegant theory which allows to describe many 
such invariants. Our theory is based on the quaternion theory. We propose 
quaternion- valued invariants, which are related to the descriptions of objects 
as the zero sets of implicit polynomials. These are global invariants which show 
great promise for recognition of complicated objects. Quaternion-valued invari- 
ants have good discriminating power for computer recognition of 3-D objects 
using statistical pattern recognition methods. For fast computation of low- and 
higher-order quaternion-valued invariants we will use modular arithmetic of Ga- 
lois fields and rings, which maps calculation of quaternion- valued invariants to 
fast Fourier-Glifford-Hamilton-Galois NTT and which reduces the computation 
complexity of the global recognition algorithm from Q{3N'^) to Q{N^) for 3D 
grey-level images. 

The Hamilton quaternions (4-D hypercomplex numbers) of the form q=w-|- 
ix + jy + zk, where ui, x,y, z G R, form 4-D algebra 7f4(R|l, i,j,k) := R-fRi-|- 
Rj-|-Rfc [43]. This noncommutative number system is therefore characterized by 
three imaginary units i,j, k which satisfy the following multiplication rules = 
= k^ = jk = —1. 

4.1 Generalized Quaternions 

The 4-D Hamiltonian algebra 7t4(R|l, i,j, k) := R -1- Ri -1- Rj -|- Rfc, can con- 
sidered as 2-D algebra over the field of complex numbers: 

7t2(C|l, j) := C + Cj = {a -f bj I a, b e C}. (22) 

Surely, in (22) for hyperimaginary unit j we have = —1. But it can be set = 
— 1, 0, 1 and take generalized complex numbers, then we obtain new quaternion 
algebra: 
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H2(^2(R|1,i) I Ij) = ^2(R|l,^)+^2(R|l,^b■ 

= {a+ bj I a,be^2(R|l,*)}, ( 23 ) 

where = —1, 0, 1 and = —1,0, 1. This generalized quaternion algebra were 
proposed by Clifford [44]. Introducing designation I,J,K for all three new hy- 
perimaginary units we can represent this 2-D algebra as 4-D algebra over the 
real field 

GHAi (r|1, /, J, = y42(R|l, I) + ^2(R|1, I)J = R + RI + RJ + RK. 

Every generalized quaternion q and its conjugate have the unique representation 
in the form q = t + xl + yJ + zK, q = t — xl — yJ — zK, where t, x, y, z are real 
numbers, and product qq is equal to 

qq = l|a|U.(R) - ^^II&IU.(r) = ~ - j\y^ - I^z^), (24) 

is called pseudonorm of generalized quaternion q, where ||a|U 2 (R)i ll^ll. 42 (R) 
module of generalized complex numbers in the algebra ^ 2 (R|l,-f)- It can take 
both positive and negative values. If pseudodistance between two generalized 
quaternions p and q is defined as module of their difference p{p, q) = |p— q| , then 
algebra QHA 4 {R\l, I ^ K) of generalized quaternions is transformed into 4-D 
pseudometric space designed as GTi-i. Surely, there are nine such spaces. 

Subspace of pure vector generalized quaternions xl + yJ + zK is 3-D space 
G'R-s ■= Vec{^?f 4 }. Introduced in G'H^ pseudometrics induce in G'R-s corre- 
sponding pseudometrics, the expressions for which are obtained from p(p, q) for 
t = 0. There are only three such non-trivial pseudometrics: 



p(Vec{p}, Vecjq}) = |(Vec{p} - Vec{q}| = |Vec{u}| = 



= \/\\xI+yJ + zK\\gn^ = ^||x/||^c2 “ J'^Wy + ^^Wlc: 



a/(x2 - 

= |x|. 



Corresponding 3-D metrical spaces will be denoted as 

They form Euclidean, Minkovskean, Galilean 3-D pseudometric spaces. When 
we are dealing with all the three pseudometric spaces we will use symbol GTi-s- 
As we known generalized complex number and classical quaternions of the 
unit modulo has the following form z = = cos (^) -I- /sin (</>), q = = 

cos (/> -b iu sin (f), where cos 4 >, sin (j) are trigonometric functions in corresponding 
2D ^C 2 -geometries. Generalized quaternion of the unit modulus can be written 
in the such form [44]: q = 6^““^ = cos(i^) + /uSin(i^), where 4> is a rotation angle 
around purely vector quaternion u of the unit modulus (|u| = 1, u = — Uq). 



Definition 4 [44]. Transformations q' = q-b p, q^ = Aq, 

q'_g.fui 0 iq^ q" _ qg-/u2b>2^ q'" _ g/ui0i/2qg-/u2b>2/2^ 



(25) 



where q, p € GTi-^^ X € R^, are called translations, dilations and rotations of 
4-D space GH 4 , respectively. 
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They form the translation Tr(^7f4), the dilation M(^7i4), and rotation 
SOLiGHi), SOr{G'H 4), SOLR{gH4) groups. 



Theorem 6 [ 44 ]. Transforms 



q = e 






q" = 



q = e 



l/2qg-/u202/2 



form three groups of left M.OY l{GH4), right MOV/{(^?f4), and double-side 
MOV lr{G'H4) motions of the space GH4 = Sc{t/7f4} 0 Vec{t/7f4} = R©t/7?.3. 



If setting Ui = U2, (j)i = in Ti?-transform ( 25 ) then it will maps 
real axis R = Sc{QTi.4} and 3 D vector subspace QTZs = Vec{^?f4} into it- 
self: R = q^^Rq, QTZ^ = q^^GTZ^q. Hence, transforms x' = g“^xq, q = 
g/„0/2^ X e GTZ^ are rotations of ^7?.3-space around point 0 and form the group 
of its rotations SO(G7Z3). 

Theorem 7 [ 44 ]. Every motion of 3 D space GTZs is represented in the form 
x' = Q-^xQ + p, Q:= x G GTZs- ( 26 ) 



4.2 Quaternion Moments and Invariants of 3 D Images 

Let f(q) be a 3D image depending on the pure vector generalized quaternion 
q G Vec{t/7t4} = GTZs. Let c be the centroid of image /(q). 

Definition 5 [10], [45]. Functionals of the form 

%{/}=/ (q-c)P/(q)dq (27) 

j qeGTZs 

are called one-indexes G'HA 4 ~valued central fractional moments of 3-D image 
/(q), where G'HA 4 := ^7t.44(R|l, /, J,K), dq := dxdydz, andp G Q is a rational 
number. 

Let us find rules of moments changing with respect to geometrical distortions of 
the initial image. These distortions will be caused by translation q — > q + a, 
rotation q — > QqQ~^, where Q = and dilatation: x — > Ax, where 

A G R+ \ 0. If /(q) is the initial image and fxQa{q) its distorted version then 

/AQa(q) := /(AQ(q + a)Q-i) = f{q*), (28) 

where q* := AQ(q + a)Q“^. 

Theorem 8 [10]. Central moments Mp are the respective ^7iAl4-valued invari- 
ants 

Mp{/AQa} = AP+3QPMp{/}Q-P, (29) 

of the small affine group aff {GTZs) with left Qp and right Q~^ multiplicators. 
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Being invariants, they will denoted as JJp{aff(^7?.3)|/} := Mp{/, c} := Mp{/}. 

P + 3 

Obviously, the following ratios Np := Mp/Mg ^ can be call normalized moments. 
These moments are respective 07iyl4-valued invariants 

Np{/AQa}:=Q^Np{/}Q-^’ (30) 

of the small affine group aff(^7?,3) with left QP and right Q~p multiplicators, 
respectively. As ^?i 4 -valued respective invariants have both left and right mul- 
tiplicators common multiplication can not obtain absolute invariants. Let us 
show now that such invariants exist among module of unary moments. For 
this aim calculate modulus of the lefthand and righthand parts of Eq. (30): 
|Np{/ACa}| = |Np{/}|. 

Theorem 9 [10]. Module of moments 

|Np{/ASa}| = |{Np{/}| = Ip{af^(07^3)| /} 
are absolute scalar-valued invariants of the affine group. 

4.3 Fast Calculation of Quaternion— Valued Invariants Based on 
Fourier Clifford— Hamilton Transforms 

For digital estimation one-index quaternion-valued moments we have 



7V-1 N-1 N-1 

Mp = ^ X] + yj + y, z) = 

x—0 y—0 z—0 



N-1 N-1 N-1 



^ ^ ^ + {{y + zl)jy f[xl+ {y + zI)J] 



x—{) y=0 z—0 



VN-l 






_ x—0 



IP 



'N-1 N-1 

X] '^iy - + zi)^^^f2{y,z) 

y = 0 z = 0 



JP, (31) 



where 



n-in-1 n-1 

fi(x) := X X + (y + My^ ^) = X + (y + 

y—0 z—0 x— 0 

In Eq. (31) we used the following equalities [xl+{y + zI)J]p = xJP + [{y + zI)J]p 
and [(y -I- zI)J]p = (y — z/)^“+[(y -I- z/)l+ because IJ = —JI and (zJY = 
zJzJ = z~zJ J = ~zJ^. 

In modular case 3D numbers of the form [xl + {y + zI)J], where x,y,z G 
GF((5) form 3D modular space of the type 



GF(Q)/ + GF(Q) J + QF{Q)K = GF(Q)/ + [GF(Q) + GF(Q)/] J. 



Fast Calculation Algorithms of Invariants 



93 



This is the vector part of the modular 4D quaternion algebra GF(Q)+[GF((5)/+ 
GF(Q) J+GF(Q)iC]. Let Q be such prime, that [GF(Q) + GF(Q)/] is the Galois 
field GF(Q^) then the vector part is represented as the following sum 

GF(Q)/ + [GF(Q) + GF(Q)/] J = GF(Q)/ + GF{Q^)J. 

Let e an £■ be primitive root in Galois fields GF(Q), GF(Q^), respectively. 
Obviously, e = f Then we can write x = ^ {y + Iz) = where 

t = indg{x), X € GF(Q) and s = inds{y + Iz), (y + Iz) G GF(Q^). Substituting 
the latter equations into Eq. (31) we obtain 

Q-i Q-i Q-i 

Mp = Mp (mod Q) = EEE [xl + {y + zI)J]Pf[xI + (y + zI)J]{modQ) = 

x—0 y—0 z—0 



■Q-2 

,t=o 



■ p + 



Q ^-1 

Y f{(p-i50<3+if[}72(f^) 

s=0 



(modQ). 



We obtain new algorithm for calculating modular quaternion-valued moments 
A4p as 2-D Fourier-Glifford-Hamilton-Galois NTT [10], [45] of the image 
/(£*,£’'*). Its computational complexity is defined complexity of fast algorithm 
this transform [23] : [Q^(Q - 1) + Qloga Q] + [Q(Q^ ~ 1) + log 2 Q] « 2Q^ 
additions and < 51 og 2 Q + 2Q^log2 Q = {2Q + l)Q\og 2 Q multiplications. Gompu- 
tational complexity of this algorithm can be reduced by special choice of prim- 
itive roots. Indeed, if £ = ±2 and £ = ±2, ±2(1 ± I) computation complexity 
is reduced to only 2Q^ log 2 Q additions. We obtain now two versions of the new 
algorithm and will denote them as follows: 



- Algg(FGHG-NTT„; £; £■ Q^; GF(Q); O j^^{2Q^) , 0^^i2Q^ log^ Q)) , 

- Alg[,(FGHG-NTT„; ±2; ±2(1±I); Q^;GF{Q);Oj^^i2Q^y, o) . 



5 Fast Calculation Algorithms of Invariants of Mnlticolor 
Images Based on the Multiplet—Fonrier— Clifford— Galois 
NTTs 

The concept of color and multispectral image recognition connects all the topics 
we have considered. In this work, the term ’’multicomponent (multispectral, mul- 
ticolor) image” is defined for an image with more than one component. A RGB 
image is an example of a color image featuring three separate image components 
R(red), G(green), and B(blue). 

Our main hypothesis is: the brain of primates calculates hypercomplex- 
valued invariants of an image during recognizing [13]-[21],[37]-[41]. Visual sys- 
tems in primates and animals with different evolutionary history use different 
hypercomplex algebras. For example, the human brain uses 3D hypercomplex 
(triplet) numbers to recognize color (RGB)-images and mantis shrimps use lOD 
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multiplet numbers to recognize multicolor images. From this point of view the vi- 
sual cortex of a primate’s brain can be considered as a ’’Clifford algebra quantum 
computer” [10]. 

In this section we propose a novel method to calculate invariants of color and 
multicolor images. It employs an idea of multidimensional hypercomplex num- 
bers and combines it with the idea of number theoretical transforms over hy- 
percomplex algebras, which reduces the computational complexity of the global 
recognition algorithm from C>(fcniV”+^) to log N) for nD fc-multispectral 

images. 

5.1 Multiplet Numbers 

The multicomponent color image of an object is measured as fc-component vector 





fi{x,y,z) 




'J^S°’>i{X',x,y,z)H,{X)dX' 


fmcoz(x) ■ — 


f 2 {x,y,z) 


= 


J^S°»^iX',x,y,z)H2(X)dX 




_fk{x,y,z)_ 




3°’’^ (A; X, y, z)Hk(X)dX_ 



where Hi{X), H 2 {\), ■ ■ ■ , Hk{\) are sensor sensitivity functions. For example, 
if /c = 3 and fi{x,y,z) = fR{x,y,z), f 2 {x,y,z) = fG(,x,y,z), h{x,y,z) = 
fB{x,y,z) then we have color (RGB) images. We will interpret such images as 
hypercomplex-valued signals 

imcoi{x,y,z) = fol + fi{x,y,z)e]^^^i -b . . . -b /fc-i(x, y, (33) 

which takes values in the multiplet algebra ^(Rjl, • ■ ■ ’ where 

^mcoi — 1- In particular, RGB-color images are represented as triplet- valued 
functions: 



icoi{x,y,z) = fR(x,y,z)lcoi + fG{x,y, z)ecoI + fB{x,y,z)el^i. 



Multiplet numbers are represented in its basic form by 

Cl = CqI -b CiEmcol + C2£mcoi + • • • + Cfc-l^mcoi’ 

where 1, > £^m"coi are hyperimaginary units and = 1. They 

form multiplet algebra'. 

^fe(R-) = .4 a;(R I 1, ) £mcL) ■= f^I + R-^mcoi + ^-^mcoZ + • • • + R-Emcoi 

which we will call multicolor algebras and denote as M™'”°*(R|1, Smcoi’ ■ ■ ■ ’ 
or for briefly as 

One can show (see [47]) that multiplet algebra is the direct sum of real 

and complex fields R, C : 



^-1 






R • + R • + E [C • ^Ch] > if ^ even, 

1=1 

fc -1 

R-eL+ E[c-e^^j, 

i=i 



if k odd 



( 34 ) 
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where and are orthogonal idempotent ’’real” and ’’complex” hyperimag- 
inary units, respectively, such that {elJ^ = {efj^ = 

and = 0, for all i,j. For example, triplet color algebra is the 

direct sum of real R and complex C fields: 

Af = Af{R I 1, Seal, elJ := RUoi + Rscoi + ReL = R • + C • Rch- (35) 

where eiu := (1 + Scoi + ^Ch ■= (1 + ujScoi + w^£co;)/3 are orthogonal 

idempotent ’’real” and ’’complex” hyperimaginary units, respectively and tu := 

2TT^y— i 

= Bcoi, E^^ = Ech, eiuEch = Rcheiu = 0. Here = 1. Therefore, 
every triplet C is a linear combination C = a ■ eiu + z • Ech of the ’’scalar” and 
’’complex” parts aeiu, 'zEch, respectively. The real numbers a C R we will call 
intensity (lumiance) numbers and complex numbers z = 6 + jc G C we will call 
chromacity numbers. 

Let kiu = 0, 1, 2 and kch = f j | ~ 1) Every multiplet C can be repre- 
sented as a linear combination of kiu ’’scalar” parts and kch ’’complex” parts: 

kiu kch 

^ ■ ^Ch)- 

i=l i=l 



The real numbers S R are called intensity numbers and complex numbers 
Zj = b + ic G C are called multichromaeity numbers. Two main arithmetic 
operations have very simple form in such representation: 



C^+C^ = 



kiu 


kch 




kiu kch 


Ek' 


eL) + EK 


± 




2^1 


j=i 




_i=i j=i 



ki^ 



a} ± a? 



kch ^ 

a :[ 

j=i 









■c‘^ = 









E(a^eL)+E(^^E^eJ 



kiu 

E 



[a* 






E 



E 



3 

Ch- 



i=l j=l 

Multiplet algebras possess divisors of zero and form number rings. 



5.2 Algebraic Model of Perceptual Multispectral Spaces 

The multicomponent color 3D image (32)-(33) we will interpret as multiplet- 
valued 3D signal 

kiu kch 

fmcoz(q) = ■ eL] + ■ Rch]i 

i=l j=l 



k 



qGGTZlP = gn{3\I,J,K), (36) 
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which takes values in multiplet algebras . Such algebras generalize classical 
HSV-model perceptive color space on multispectral spaces. 

We will use the generalized complex algebra ^ 2 (R|d) in (34) instead of the 
complex field C. As result we obtain new generalized multiplet algebras 

fciu fccfi 

g^^coi ^ ^ (37) 

i=i 

Definition 6 [10], [48], [49]. A multispectral 3D image of the form 

fmcoi(q) = ^]/ta(q) • e[„] + ■ E^cJ, (38) 

i=i 

where fmcoz(q) € is called multiplet-valued image and the generalized 

multiplet color algebra (37) is called generalized perceptive multicolor 

space. 

Definition 7 [10], [49]. If c is the centroid of the image fmcoi then functionals 



OTp := 



qeG'Rf'’ 



kiu ( 

= E 



(q- c)Pf„coi(q)dq = 

/ 



kch 



(q-c)Pf/„(q)dq -eiu + Y^ 



\qe67?,: 



Sp 



i=i 



(q-c)«ejq)dq -E 



Ch 



V qea-R.f" 



are called central -valued moments of multicolor 3D image fmcoi (q)- 



Here all products of the type q^fmcoi, where q G imcoi G GA]^™\ 

are spatial-multicolor numbers belonging to the generalized spatial-color alge- 
bra [10] 



Q ^SpMcol 



'GHAfP ■ GAr°‘=GA%^;°i^^y if Af (R|/^p) = A'2(R|jf ^), 

Vj = 1 , ..., kch] 

GnAf^®GAr°'= if ^ Ai(R\lf'^), 

Vj = 1 , ..., kch- 



For example, let q G GTi-Af^ is a generalized quaternion and imcoi = fco/ G 
GA^°\ then product 



(GHAfP -GA^°‘ =GAs^^°\ if aI^{R.\I^p) = A2{R.\I^^), 
X GHAfP 0 GA^°^ = GAff°\ if Af^(R|/^P) ^ A2(R|/‘^'‘). 



Fast Calculation Algorithms of Invariants 



97 



In the first case spatial-color numbers qfcoZ are 8D generalize biquaternions and 
in the second case they are 12D Hurwitz numbers belonging to the generalized 
Hurwitzion algebra [50]. 

Changes in the surrounding world as such of intensity, multicolor or illu- 
minations can be treated in the language of the multiplet algebra as action of 
the multiplicative group x := (M x SO){gAT‘'°^)] 

in perceptual multicolor space where and SO(C/^™“ ) are 

the dilatation and the rotation groups of generalized perceptive multicolor space 
gj^mcoi^ respectively. 

kiu ^Ch 

LetA= E[a^i^-elu]+E[^^^-'E^ch] e [(M x SO) Let us clarify the 

i=l j=l 

rules of moments transformation with respect to [(M x SO)(^yI™™*)]-multicolor 
and [aff(^72.3^)]-geometrical distortions of initial images. If 

f„.coz(z) = £[/L(z) • elJ + |][/^,(z) • 
i=i j=i 

hu kch 

fmco;(q) = ■ eL] + Yl^fchi^) ' %h\ 

i=i j=i 

are initial 2D and 3D multicolor images then 

= ylfmcoz(v(z + a)) = 

kiu kch 

= + a))eL] + + a))E^?,], 

i=i i=i 

= A{mcoi{XQ{q + a)Q-^) = 

^lu ^Ch 

= + a)Q-i)eiJ + /c.(AQ(q + a)Q-^)E^^,j 

i=l j=l 

denote their multicolor and geometric distorted copies, respectively. 

Theorem 10 [10]. The central moments DJtp of the color images fmcoz(z), 

fmcoz(q) are relative C/.4'®^^'^°^-valued invariants 

ZpU,J^ol} ■■= = AvP\x\^[mp{^mcol}], (39) 

ap{AQafE/} := ^pUqJ^coi} = X^^^AQP[Mp{{mcol}]Q-^ (40) 

with respect to the spatial-multicolor groups [aff(^Cf^)] x [(M x SO)(C/^^™*)] 
and [aff(^72.f^)j x [(M x SO)(^^^°°*)j, respectively. 

Let us consider 2D multicolor images. 
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Definition 8 [10]. The products • • • 911^® are called s-ary gj[SpMcoi_ 

valued central moments, where S Q. 



Theorem 11 [10]. The s-ary central moments of the 2D image /(z) are relative 
^^'®^^'^°*_valued invariants 



= V 



-jfci....,fcsr f-A \ = nrt^i nr?'=2 . . . g7i'='» 1 ,1 = 

1 v,a-‘-mcoZJ I v,a-*-mcoZ J 



with respect to the spatial-multicolor group [Aff(^Cf^)] x [(M x SO)(C/A^“*)] 
with -valued multiplicators v(Pi'=i+ -+P-'='=)(A|v|2)('=i+ -+'=d^ which 

have s free parameters k\, . . . ,kg. 



The s-ary moments of the form • • • 911^* , where kipi + . . . + ksPs = 0, 

and ki + . . .kg = 0, are called normalized central one-index moments. Normalized 
central one-index moments are by definition absolute complex-valued invariants 
with respect to the spatial-multicolor group [afF(^Cf^)] x [(M x SO){QA^‘^°’’)]. 
Being invariants, they be denoted as 9p]’;;;’p®{fmcoz}- 

For 3D multicolor images we obtain the following results. Obviously, the 

P+3 

following ratios 91p := 9 JIp/9JIq'’ are normalized moments. They are respective 
^A'®^^'^°*-valued invariants 

% {xQj;^col} ■■= {fmcol} Q-^. (41) 

with respect to the spatial-multicolor group [aff(^Cf^)] x [(M x SO)(^A^“*)] 
with left QP and right Q~p multiplicators, respectively. 



Theorem 12 Module of unary moments \^p{xQAmcoi}\ ~ l^p{fmcoi}| are ab- 
solute scalar-valued invariants with respect to the spatial-multicolor group 
[aflf(^icf^’)] X [(M X SO)(^lM““')]. 



5.3 Fast Calculation Algorithms of Multiplet Invariants of 

Multispectral Images Based on Multiplet— Fourier Clifford 
Transforms 

As every term of a discrete multicolor image 

imcoi{m,n) = /o(m,n)l -b fi{m,n)el^^^i -b . . . -b 

has 2” gray-levels that there are no principal limits for considering the math- 
ematical model of every term as function which has their values in the Galois 
field GF(Q) : 



Mm,n) : [0, A - 1]^ ^ GF(Q), i = 0, 1, ..., fc - 1 
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if (5 > 2”. In this case numbers of the form oq + + ... + ak-is'^^h, 

where Oj G GF(Q), are called modular multiplets. They form the modular mul- 
tiplet algebra: 

GA^^°\G¥{Q) I 1, • • • , 4”co;) = GF(Q)+GF(Q)e^,„;+. . .+G¥{Q)e^~^l,. 

One can show that for special cases Q modular multiplet algebra is the direct 
sum of the Galois fields GF(Q) and GF(Q^) : 

kiu kch 

gA^^°\GF{Q)) := ^ GF(Q) • e,„ + ^ GF(g^) • 

i=i j=i 

Definition 9 A multispectral discrete image of the form 

kiu kch 

^mcoi{m,n) : ^ GA^^°\GF{Q)) := ^ GF(g)ej, + ^ GF(g2)E^^^ 

i=i i=i 

is called a modular multiplet-valued image. 

This model can also does computers to process values of image according to 
GF(g)-arithmetic laws. 

Definition 10 Functionals Mp{fmcoi} '■= ^p{^mcoi} (modQ) := 

Q-lQ-l /Q-iQ-i \ 

= ^(.rn + Infimcoi{m + In) = X! + Inf f}{m + In)\ -ef+ 

m— 0 n— 0 i—1 \m— On— 0 / 

fcsh/Q-iQ-i \ 

+ ^( ^ ^(m + /n)«/c/.("t + ^n)j •E^^(modg) 

j=il \m— 0 n— 0 / 

are called modular GA^^^°’’ {GF{Q))-valued moments of multicolor image 
fmcoz(z)- 

Let £ be a primitive root in the Galois field GF(g^) then m + In = , and 

kiu /Q^-1 \ ksh /Q^-1 \ 

Atp^^fjncof ^Ch (modg). 

i=i y i=o j j=i y fc=o j 

We obtain new algorithm for calculating modular moments JAp as the (kiu+kch) 
Multiplet-Fourier-Glifford-Gauss-Galois NTTs: 

Algio(MFGGG-NTTi, S ,N = Q,AfGF{Q)), 

^AdMu((**“ + ^Ch)N‘^ log2 N)) . 

Its computational complexity is defined complexity of (kiu + kch) fast Multiplet- 
Fourier- Glifford Transforms: [kiu+kch] [A^log 2 iV] additions and multiplications. 
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Computational complexity of new algorithm can be reduced by special choice of 
primitive root S. Indeed, ii £ = ±2,±2/,±(l ± I) or 2(1 ± I) then Multiplet- 
Fourier-Glifford transform is reduced to the computation of 2-D fast Rader- 
transform, which can be done without multiplication. Computational complexity 
of such computational scheme is only [kiu + kch]N'^ log 2 N additions. 

6 Conclusions 

Higher speed of computation is the most important property of the pattern 
recognition algorithms. Unfortunately, the direct method suffers from high com- 
plexity. This algorithm needs 0(nA^"+^) operations to evaluate N'^ moment 
invariants. Using modular arithmetics of the Galois fields and fast number the- 
oretical transforms we reduced the computer complexity of the first stage of the 
global recognition algorithm from 0(nA^"+^) to 0{nN^ log 2 N) for the nD grey- 
level images. We developed six algorithms with low complexities Algj^-Algg. 

We have shown that it is possible to use complex Clifford-Causs arithmetics 
and different Fourier-Clifford-Causs-Galois NTT for direct and fast calcula- 
tion of absolute complex-valued invariants. We constructed two new fast algo- 
rithms Algy, Algg. Computational complexity of these algorithms is equal to 
0{2N^ log 2 N). Naturally, the method is subjected to several conditions and as- 
sumptions. Foremost among them is spatial window limitation. However, this 
limit can be removed using the multiwindow technique in combination with the 
Chinese Remainder Theorem. 

We have presented a novel algebraic tool for the integration of data from 
multiple sensors into a uniform representation. We have provided an explicit ex- 
pressions for relative and absolute quaternion-valued invariants of color and k- 
multispectral 2D and 3D images with respect to geometrical and color distor- 
tions. The behavior of relative invariants with respect to the more important 
subgroups of the spatial-color groups is studied in detail. Our technique uses 
high-dimensional hypercomplex algebras and reduces the computational com- 
plexity of global recognition algorithm from O(kN^) to 0{kN^ log N) for 2-D k- 
multicolor images. 
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Abstract. In this paper we will use the mathematical framework of geo- 
metric algebra (GA) to illustrate the construction of articulated motion 
models. The advantages of solving the forward kinematics of a given 
problem in this way are that the equations can be constructed in a 
coordinate-free fashion using the GA representations of rotation, rotors. 
One can then use the rotor parameters as our variables in a tracking 
scheme where we use Kalman filters to track real motion data from mov- 
ing subjects - this has various advantages over the standard Euler angle 
approach. The paper then looks at the advantages of this system for 
solving the inverse kinematics - estimating the model given the posi- 
tions/observations. It will be shown that the often complicated inversion 
procedures can be simplified by a combination of incidence geometry and 
rotor inversion. 

Keywords: Rotations, geometric algebra, articulated motion, motion 
estimation, motion modelling, tracking, Kalman filters, forward and in- 
verse kinematics, conformal geometry. 



1 Introduction 

The main driving force behind the development of the modelling techniques 
we will describe in subsequent sections has been the need to provide fast and 
efficient algorithms for optical motion capture. Optical motion capture is a rela- 
tively cheap method of producing 3D reconstructions of a subject’s motion over 
time, the results of which can be used in a variety of applications; biomechanics, 
robotics, medicine, animation etc. Using a system with few cameras (3 or 4) 
we find that in order to reliably match and track the data (consisting of bright 
markers placed at strategic points on the subject) we must use realistic models 
of the possible motion. Once the data has been tracked using such models, we 
are in a position to analyse the motion in terms of the rotors we have recovered. 

The mathematical language we will use throughout will be that of geometric 
algebra (GA). This language is based on the algebras of Glifford and Grassmann 
and the form we follow here is that formalised by David Hestenes [I]. There are 
now many texts and useful introductions to GA, [2, 3, 4, 5], so we do no more here 
than outline why it is so useful for the problems we will discuss. 



G. Sommer and Y. Y. Zeevi (Eds.): AFPAC 2000, LNCS 1888, pp. 104-114, 2000. 
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In a geometric algebra of n-dimensions, we have the standard inner product 
which takes two vectors and produces a scalar, plus an outer or wedge product 
that takes two vectors and produces a new quantity we call a hivector or oriented 
area. Similarly, the outer product between three vectors produces a trivector or 
oriented volume etc. Thus the algebra has basic elements which are oriented 
geometric objects of different orders. The highest order object in a given space 
is called the pseudoscalar with the unit pseudoscalar denoted by /, e.g. in 3D I 
is the unit trivector e\f\cif\ei, for basis vectors {ci}. Multivectors are quantities 
which are made up of linear combinations of these different geometric objects. 
More fundamental than the inner or wedge products is the geometric product 
which can be defined between any multivectors - the geometric product, unlike 
the inner or outer products, is invertible. For vectors the inner and outer products 
are the symmetric and antisymmetric parts of the geometric product; 

ab = a-b + aAb (1) 

In effect the manipulations within geometric algebra are keeping track of the 
objects of different grades that we are dealing with (much as complex number 
arithmetic does). For a general multivector X, we will use the notation {X)r to 
denote the rth grade part of X. 

In what follows we shall use the convention that vectors will be represented by 
non-bold lower case roman letters, while we use non-bold, upper case roman let- 
ters for multivectors - exceptions to this are stated in the text. Unless otherwise 
stated, repeated indices will be summed over. 



2 Rotations 



If, in 3D, we consider a rotation to be made up of two consectutive reflections, 
one in the plane perpendicular to a unit vector m and the next in the plane 
perpendicular to a unit vector n, it can easily be shown [4] that we can represent 
this rotation by a quantity R we call a rotor which is given by 

R = nm 



Thus a rotor in 3D is made up of a scalar plus a bivector and can be written in 
one of the following forms 



R = = exp 




e j . 0 

= COS - — In sin - , 
2 2 ’ 



( 2 ) 



which represents a rotation of 9 radians about an axis parallel to the unit vector 
n in a right-handed screw sense. Here the bivector B represents the plane of 
rotation. Rotors act two-sidedly, ie. if the rotor R takes the vector a to the 
vector b then 



b = RaR. 
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where R = mn is the reversion of R (i.e the order of multiplication of vectors 
in any part of the multivector is reversed). We have that rotors must therefore 
satisfy the constraint that RR = 1 . One huge advantage of this formulation is 
that rotors take the same form, i.e. R = ±exp(i?) in any dimension (we can 
define hyperplanes or bivectors in any space) and can rotate any objects, not 
just vectors; e.g. 

R{aAb)R = {RabR)2 = {RaRRbR)2 

= RaRARbR ( 3 ) 

gives the formula for rotating a bivector. 

Before we leave the topic of rotations, we will outline one property of rotors 
which will turn out to be familiar to us from classical Euler angle descriptions 
of 3 D rotations. Consider an orthonormal basis for 3 -space, {ei, 62, 63}; suppose 
we perform a rotation Ri, where Ri = i.e. we first rotate an angle 9i 

about an axis ei. We then follow this by a rotation of 62 about the rotated 62 
axis - this second rotor, i?2, is given by 

i?2 = 

The combined rotation is therefore given by Rt = R2R1 ~ this can be written 
as follows: 



Rt = {cos 02/2 — IR\e2Ri sin 02 / 2 }i?i 
= i?i{cos02/2 — Ie2 sin02/2}i?ii?i 
= RiR'2 ( 4 ) 

since = 1 and R\aRi = a for a a scalar.. Thus if R'2 is the rotation of 62 

about the non-rotated axis (i.e. just 62 in this case), we see that the compound 
rotation can be written in two ways 

R2R1 = R1R2 ( 5 ) 

Now recall the classical Euler angle formulation: any general rotation can be 
expressed as follows: a rotation of 4> about the 63 axis, followed by a rotation of 
9 about the rotated ei axis, followed by a rotation of ijj about the rotated 63 
axis [6] , as shown in figure 1 

Something we always want to do is to apply such a rotation to a vector x. In 
GA terms we have 3 rotors representing the 3 rotations: 

i?i = exp{-/^e3}, i?2 = exp{-/^e'i}, R3 = expl-J^eg} 

where = R\e\R\ and 63 = i?2.Rie3J?iJ?2- The combined rotor is 
Rt = R3R2R1 so that x' = RtxRt 

This is all very straightforward, mainly because we are dealing with aetive trans- 
formations. 
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If 



0 



Fig. 1. Sketch of the three elementary rotations in the Euler angle formulation - 
in which initial axes (ei, C2, 63) are rotated to final axes (ei/, 62/, 63/) 

Now, if we implement our Euler angle formulation via rotation matrices, [6], 
we see that we have 3 rotations matrices: 



which represent the rotations about the non-rotated axes and we apply these 
matrices in reverse order to form 



If i?2: ^'3 the rotors representing the rotations encoded in Ai, A2, A3 
(i.e. rotations about the non-rotated axes), then we therefore see that (noting 

R[ = Ri) 



which is precisely the formula that we know relates rotations about rotated and 
non-rotated axes given in equation 5 . Confusion often arises due to the passive 
nature of the Euler angle formulation as given in standard textbooks - there is 
no such confusion possible if we work totally with active transformations, as one 
is forced to do with the rotor formulation. 

3 Articulated Motion Models: Forward Kinematics and 
Tracking 

We begin by considering a simple model of a leg as two linked rigid rods shown 
in figure 2 . Let us assume that the first rod, AB, can rotate with all degrees 
of freedom about point A but that the second rod, BC, can only rotate in 
the plane formed by the two rods (i.e. about an axis which is perpendicular 
to both rods and initially aligned with the 62 axis). In reality more complex 
constraints can be considered, ei, 62, 63 form a fixed orthonormal basis oriented 





COS Ip sin 7p 0 
— sin 'ip cos Ip 0 
0 0 1 



At = A1A2A3 so that = Atx 
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Fig. 2. Two linked rigid rods used to simulate the leg 



as shown. Xa,Xb,Xc are the vectors representing the 3D positions of A,B,C 
respectively and x\ = Xb — Xa and X 2 = Xc — Xb, which, initially, take the 
values diei and d, 2 ei. We can immediately write down the position of points B 
and C as 



Xb = Xa + diRieiRi ( 6 ) 

Xc = Xb + d2R2RieiRiR2 (7) 

where we have i?i = exp{— and we allow for the fact that the point A 
may move in space (note here that we allow any rotation of rod AB about A, 
although we may want to only have 2 dof rather than 3 if we are not interested 
in the orientation of the axes at A). We also have that R 2 = exp{— I^n^} with 
n '2 = Using the fact that R 2 R 1 = RiR '2 with R '2 = exp{— 1 ^ 62 } we 

are able to give the position of the ankle, Xc as 

Xc — Xa d\R\C\R\ -\- d2RlR2^1^2^^ ^ dii^diCi d2J?2^ 1-^2 1-^1 (^) 

Thus we are able to write down, in a manner which deals only with active trans- 
formations, such forward kinematics equations for arbitrarily complex mecha- 
nisms. But this is not the only advantage of this approach; we can now have 
the elements of our state as rotors — it is well known that singularities can occur 
using Euler angles (i.e. when an angle goes to zero, 90° or other specific ranges) 
and we can avoid many of these singularities using the rotor components as our 
variables. The use of such models in optical tracking scenarios is briefly discussed 
here. 

In a typical multi-camera tracking problem where we place markers on joints, 
the measurements (2D points in the camera planes) will be related to the state 
via a measurement equation: 



y{k) = Hk{x{k)) + w{k) 



(9) 
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where the y{k) is our set of measurements (observations) at time t = k, x(k) is 
the state at time t = k (parameters describing our model(s)), and w(k) is a zero- 
mean random vector representing noise at the detection points. The function Hk 
relates the model parameters to the observations. In this case we take our model 
parameters to be the coefficients of the bi vectors representing the rotors {B = 
bilei + b2le2 + b3le3) and then use expressions such as equation 8 to relate these 
to our observations. 

The process equation 



x{k -I- 1) = Fk{x{k)) -1- v{k) 

tells us how our system (model) evolves in time; here v{k) represents the process 
noise. In the case described, Fk tells us how we believe the bi vectors to be 
evolving - one might argue that the variation of the bivectors will be smoother 
than the evolution of separate Euler-angles. 

In general, Flk will be extremely non-linear and so the above problem can 
be solved by applying an extended Kalman filter (EKF) to update our model 
estimates and predicted observations at each time step. 

A detailed comparison of the difference between using Euler-angles and using 
bivector coefficients as the scalar model parameters in such tracking problems 
will be given elsewhere. 



4 Inverse Kinematics (IK) 

Inverse kinematics is the procedure of recovering the model or state parameters 
given the measurements - in particular, when incomplete sets of measurements 
are given (i.e. not all the joint coordinates) we can, in certain cases, recover a 
unique model or a specified family of solutions. In this section we shall outline the 
use of GA in solving IK problems by consideration of a particular, fairly simple, 
example. The example we choose is the following (it is one which often appears 
in standard texts); a system consisting of three linked rigid rods representing 
a typical insect leg - such a setup is commonly used in walking robots and 
is illustrated in figure 3. Here we fix a set of axes represented by unit vectors 
(61,62,63) (note that in the figure, -61,-62,63 are shown) at the basal joint, 
so that the angle of the first link, or coxa, is given by the Euler angles {6, A, p), 
and the rotor representing this rotation is 

Ra = = Rf^RxRe (10) 

Generally the angles (A, p) are taken as known, so that 9 alone describes the 
position of the first link. The second (femur) and third (tibia) links are such that 
only rotation in the plane of the three links is allowed, so that the positions of 
the leg are fully described by two further angles, cj) and ip as shown in figure 3. 
If we take our initial configuration to be that in which the leg is fully extended 
with all links lying along the rotated (by Ra) 62 direction (i.e. (p and ip = 0), 
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Fig. 3. Three linked rigid rods representing the leg of an insect 



then the rotations at joints B and C are given by 

RB = e^i< and Rc = e^'^< (11) 

where ej = RabiRa, e'( = RbRabiRaRb- Note here that we are rotating 
about the — ei direction in order to give the sense of </) and ip shown in figure 3. 
We are thus able to write down the postion vectors of all joints and finally of 



the foot position xp as follows 

xb = XA + Ra{—IiB2)Ra = RtiR\R0{—hB2)ReR\R,i ( 12 ) 

Xc = Xb + RbRa{ — 12B2)RaRb = Xb + RAR'Bi~he 2 )R'BRA (13) 
xp = Xc + RcRbRa{ — 13 B 2 )RaRbRc = 

Xc + RARB^ci~^3B2)R-c^B^J^ ( 14 ) 

where R'b = e^^®^ and Rq = e^T®T We can therefore write xp as 

Xp = Xa + + R^b{~^2B2 + (15) 



This uniquely gives the forward kinematic equations in terms of the three ro- 
tors Ra, Rb, Rc, if one was to convert this to angles one gets the following 
equations (which are conventionally obtained when one uses transformation ma- 
trices to denote position of one joint relative to the previous joint [7]): 

Px = (cos pL cos 9 — cos A sin p sin 9) [I 2 cos (p + h cos{cp + ip) + R] 

-I- sin A sin p[l 2 sin (p + I 3 sin(</> -|- ip)] 

Py = (sin p cos 9 + cos A cos p sin 9) [I2 cos (p + I3 cos{cp -|- V') + ^1] (16) 

— sin A cos p[l2 sin p+ I3 sin{(p + ip)] 

p^ = sin A sin 9 [l 2 cos (p + I3 cos{(p + ip) + h] + cos X[l 2 sin (p+ I3 sin(c^ -|- ip)] 

In the above, 9 = 9 — ttI2, since the convention (following Denavit-Hartenberg) is 
to measure this basal rotation angle from the 62 axis rather than from the ei axis 
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as our rotor formulation has done. Now, the inverse kinematics comes in when 
we try to recover the joint angles {6,4>,tp) given (px,Py,Pz) (and the origin of 
coordinates). Conventionally the solution is obtained by a series of fairly involved 
matrix manipulations to give the following expressions for the joint angles: 



9 = arctan 



—Px cos A sin ^ + py cos A cos p+ Pz sin A 



Ip = arctan I — W 1 — 



(j> = arctan 




Px cos pL + Py sin jj, 



- l'^ - ^ ^z^ + + y'^ - I2 - I3 



2I2I3 



2 I 2 I 3 



— arctan 



I 3 sin ip 
I 2 + h cos Ip 



(17) 



where 



X = Px cos p, + Py sin y — h cos 9 

y — —px cos Xsinp + py cos A cos /r + pz sin A — h sin 9 
z = Px sin A sin p — py sin A cos p + Pz cos A 



(18) 

(19) 

(20) 



In standard texts it is often noted that it is better to express joint angles in terms of 
arctangent functions to avoid quadrant polarities - we will return to this point later 
when discussing problems with this Euler angle formulation. Suppose that we have the 
points Xa,Xh,Xc,Xp, we will now show that it is straightforward, from equations 12-14, 
to recover each of the rotors, Ra, Rb, Rc ■ In order to do this we shall use a simple 
result from GA (see [1] for more details). Suppose that a set of three (non-coplanar 
and not necessarily orthonormal) unit vectors 61 , 62,63 is rotated by a rotor R into a 
set of three other (necessarily non-coplanar) unit vectors fi, f 2 , f 3 ~ then the unique 
rotor which performs this job is given by 

Rocl + e^fi (21) 



where the proportionality factor is easily found by ensuring RR — 1 and {e*} denotes 
the reciprocal frame of {ci}. The reciprocal frame {e'j is such that e'-Cj = 5j and can 
be formed (for 3D) as follows 

= —Ie2Ae3 
a 

2 1 r A 
6 = — resAei 

a 

3 1 r 

6 = — 7 eiA 62 , 

a 

where la = 63 A 62 Aei. 

This provides us with a remarkably easy way of extracting rotors if we know the 
joint coordinates. Let us first consider equation 12 for Ra- We can rewrite this equation 
as 



(22) 

(23) 



R\Ry{xB — xa)Rij,R\ ~ Re{—he 2 )Re (24) 

From this we can see that the vector /i = — Zi 62 is rotated into the vector gi = 
R\Ry,{xB — xa)Rij,Rx and also that, since Rg = the vector /2 = 63 is rotated 
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into itself, i.e. Q2 = 63. From this it follows that /a = //1A/2 must be rotated into 513 = 
IgiAg2- Thus, using equations 23 we can form {/*} and the rotor Rg as follows 

oc 1 + fgi 

where [/i,/2,/3] = [-^162, 63, //i A/2] 

and [gi,g2,g3] = [RxRfi{xB - XA)RfiR\,e3,IgiAg2] ( 25 ) 

Thus Ra is then recovered from equation 10 . Using this we can now look at equation 13 
which can be rewritten as 



Ra{xc — xb)Ra ~ R'B{—h^2)R'B ( 26 ) 

We can then invert as above to give 
R'b ocl + fgi 

where [/i,/2,/3] = [-^262, ei, 7 /i A/2] 

and [gi,g 2 ,g 3 ] = [Ra{xc - XB)RA,ei,IgiAg 2 ] ( 27 ) 

Finally, Rq can be recovered by precisely the same means using 

R'c cel + fgi 

where ]/i,/2,/3] = [-1362, ei, I/i A/2] 

and [gi,g2,g3] = [R'bRa{xp - xc)RAR'B,ei,IgiAg 2 ] ( 28 ) 

Thus, we see that we are able to invert our forward kinematic equations trivially if 
we have the coordinates of the joints. Of course, the IK problem as we described it 
involved being given only xa and xp. The plan we advocate is therefore to hnd xb 
and xc by purely geometric means as an initial stage, followed by the rotor Inversion 
process described above. To illustrate this, consider how we would hnd xb,xc for the 
given example. 

Taking xa at the origin, we know that 63 and xp must dehne the plane In which 
all the links must lie, call this plane $ - see hgure 4 . We can form xb via 




Fig. 4. Figure illustrating setup used to determine joint positions from geometry 
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{xj, - (xp-e'z)e'3) 
\xp - {xp-e'3)e'3\ 



(29) 



There are clearly two possibilities for xc, given by the intersections of the circles lying 
in the plane ^ having centres and radii given by {xB,h) and (xpjh). If we then 
define ey = (xp — xb)/{\xp — xb)\ and e± a vector perpendicular to e|| lying in <1>, it 
is not hard to show that xc is given by 



xc = ®||e|| + a;_Lej_ 



where 



2 

x± 



I2 — ^3 + {xp — Xb)^ 

2\xp - xb\ 

_ ( [{h — hY ~ jxp — xbY][{12 + ^ 3 )^ 

\ 4|a;p-a;B| 



{xp — Xb)^] 



} 



(30) 

(31) 



When the geometry is more complex than given in this example (indeed, things will get 
more complicated if we also have prismatic joints rather than simple revolute joints) 
the joint positions, or family of joint positions are found by intersecting circles, spheres, 
planes and lines (with possible dilations) in 3D. The system that we are currently work- 
ing on performs this initial geometric stage in the 5D conformal geometric algebra [8,9]. 
This framework provides a very elegant means of dealing with incidence geometry and 
extends the functionality of projective geometry to include circles and spheres. A fea- 
ture of the conformal setting is that rotations, translations, dilations and inversions in 
3D all become rotors in 5D. 

We now return to the question of whether we gain any advantages from doing our IK 
problems in geometric algebra. In the simple case illustrated, simulations have shown 
that we can recover the rotors (there always exist two sets of solutions) exactly for 
any combination of angles - there is no need to restrict any of the angles to particular 
ranges. However, when the equations in 17 are used to recover angles, we find that 
the whole process is plagued with conditionals, i.e. the correct solutions are obtained 
only if signs of various terms are checked for various angles in various ranges. From a 
computing point of view this is expensive and may ultimately lead to hard-to-track- 
down errors. 



5 Conclusions 

In this paper we have illustrated how the geometric algebra, and particularly the rotor 
formulation within the algebra, can be used as a mathematical system in which forward 
kinematics, motion modelling and inverse kinematics can be elegantly expressed. The 
formulations given have been put to use in tracking problems in which optical motion 
capture data is tracked via constrained articulated models and in inverse kinematics 
of simple leg structures. We believe that the system as outlined here has enormous 
potential in more complex inverse kinematics problems where we would like to define 
families of possible solutions - the key here would be to do the initial geometry stage 
via a 5D conformal geometric algebra. Work in progress also includes the analysis 
of human motion data via our articulated models in an attempt to understand how 
motions are described using the rotor formulation. 
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The Lie Model for Euclidean Geometry* 



Hongbo Li 

Academy of Mathematics and Systems Science, Chinese Academy of Sciences 
Beijing 100080, P. R. China 



Abstract. In this paper we investigate the Lie model of Lie sphere 
geometry using Clifford algebra. We employ it to Euclidean geometric 
problems involving oriented contact to simplify algebraic description and 
computation. 

Keywords: Euclidean geometry. Lie sphere geometry, Clifford algebra. 



1 Introduction 

According to Cecil (1992), Lie (1872) introduced his sphere geometry in his dis- 
sertation to study contact transformations. The subject was actively pursued 
through the early part of the twentieth century, culminating with the publica- 
tion of the third volume of Blaschke’s Vorlesungen iiber Differentialgeometrie 
(1929), which is devoted entirely to Lie sphere geometry and its subgeometries, 
particularly in dimensions two and three. After this, the subject fell out of favor 
until 1981, when Pinkall used it as the principal tool to classify Dupin hyper- 
surfaces in TZ'^ in his dissertation. Since then, it has been employed by several 
differential geometers to study Dupin, isoparametric and taut submanifolds {eg. 
Cecil and Chern, 1987). It has also been used by Wu (1984/1994) in automated 
geometry theorem proving. 

Despite its important role played in differential geometry. Lie sphere geome- 
try has limited applicability in classical geometry. This is because a Lie sphere 
transformation has classical geometric interpretation only when it is a Mobius 
transformation or the orientation-reversing transformation. In other words, a 
general Lie sphere transformation does NOT have classical geometric interpre- 
tation. 

For classical geometry. Lie sphere geometry can contribute to simplifying de- 
scription and computation of tangencies of spheres and hyperplanes. Because of 
this, we would like to “attach” Lie sphere geometry to the homogeneous model of 
Euclidean geometry in (Li, Hestenes and Rockwood, 2000a) as a supplementary 
tool. This goal is achieved in this paper. 

The tool, called the Lie model, is essentially a coordinate-free reformulation 
of Lie sphere geometry for the purpose of applying it to Euclidean geometry. It 

* This paper is supported partially by the Grant NKBRSF of China, the Hundred 
People Program of the Chinese Academy of Sciences, and the Qiu Shi Science and 
Technology Foundations of Hong Kong. 
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is used to solve geometric problems involving oriented contact of spheres and 
hyperplanes, and can help obtaining simplifications. Unfortunately, this may 
be as much as it can contribute to classical geometry. The model can also be 
extended to spherical and hyperbolic geometries via their homogeneous models 
in (Li, Hestenes and Rockwood, 2000b, c). 

This paper is arranged as follows: in section 2 we introduce the Lie model 
using the language of Clifford algebra (Hestenes and Sobczyk, 1984); in section 3 
we investigate further basic properties of this model; in section 4 we provide 
examples to illustrate how to apply it to Euclidean geometry. 



2 The Lie Model 

The Lie model will be established upon the homogeneous model. So first let 
us review the homogeneous model for Euclidean geometry (Li, Hestenes and 
Rockwood, 2000a; Li, 1998). 



2.1 The Homogeneous Model 

Let {ei, . . . ,e„} be an orthonormal basis of 72.". A point c of 72" corresponds 
to the vector from the origin to the point, denoted by c as well. The origin 
corresponds to the zero vector. 

We embed 72" into a Minkowski space of n + 2 dimensions as a subspace. 
Denote the Minkowski space by 72"“''^’^. Let {e_ 2 , e_i, ei, . . . , e„} be an an 
orthonormal basis of 72"“''^’^, where — e ?_2 = = 1. The 2-space spanned 

by e_ 2 , e_i is Minkowski, and has two null 1-subspaces. Let e, cq be null vectors 
in the two 1-subspaces respectively. Rescale them to make e • cq = — 1. 

Now we map 72" in a one-to-one manner into the null cone of 72"+^’^ as 
follows: 

ci-^c = eo-|-c-|- —e, for c € 72". (2.1) 

The range of the mapping is 

{cc€72"+^’V2^0,a:-e=-l}. (2.2) 

By this mapping, a point c of 72" can be represented by the null vector c. In 
particular, the origin of 72" can be represented by cq. Vector e corresponds to 
the unique point at infinity for the compactification of 72". This representation 
is is called the homogeneous model for Euclidean geometry. 

The homogeneous model can also be described as follows. Any null vector x 
of 72"+^’^ represents a point or the point at infinity of 72". It represents the point 
at infinity if and only if a: • e = 0. Two null vectors represent the same point or 
the point at infinity if and only if they differ by a nonzero scale. 

The following is a fundamental property of the homogeneous model: 
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Theorem 2 . 1 . Let -Br- 1,1 be a Minkowski r-hlade in Gn+ 1 , 1 , 2 < r < n + 1. 
Then Br- 1.1 represents an {r — 2) -dimensional sphere or plane in the sense that 
a point represented by a null vector a is on it if and only if a A -Br-1,1 = 0 . It 
represents a plane if and only if e A Br-ip = 0. The representation is unique up 
to a nonzero scale. 



The (r — 2)-dimensional sphere passing through r affinely independent points 
ai, . . . , ar of 7^” can be represented by ai A • • • A hr; the (r — 2)-dimensional 
plane passing through r — 1 affinely independent points ai, . . ar_i of 7?." can 
be represented by e A hi A • • • A hr_i. 

When r = n + 1, the dual form of the above theorem is 

Theorem 2.2. Let s be a vector o/7?."+^’^ satisfying s'^ > 0. Then it represents 
a sphere or hyperplane in the sense that a point represented by a null vector a is 
on it if and only if a • s = 0. It represents a hyperplane if and only if e • s = 0. 
The representation is unique up to a nonzero scale. 



The following are standard representations in the homogeneous model. 



1. A sphere with center c and radius p > 0 is represented by c 




2. A hyperplane with unit normal n and distance <5 > 0 away from the origin 
in the direction of n is represented by n + 5e. 

3. A hyperspace with unit normal n is represented by either of ±n. 

4. A sphere with center c and passing through point a is represented by h- (eAc) . 

5. A hyperplane with normal n and passing through point a is represented by 
h • (e A n). 



The following are formulas and explanations for some inner products in 

7^«+i,i. 



For two points Ci and 62 , 



Cl • C2 = 



(ci - 62 )^ 



Cl - C2 



2 2 
For point c and hyperplane n + 5e, 

c • (n + 5e) = c ■ n — 5. 



(2.3) 



(2.4) 



It is positive, zero or negative if the vector from the hyperplane to the point is 
along n, zero or along — n respectively. Its absolute value equals the distance 
between the point and the hyperplane. 

For point Ci and sphere C 2 — 



. . P^~ |ci - C2p 

Cl-(c2-ye) = ^ . 



(2.5) 



It is positive, zero or negative if the point is inside, on or outside the sphere 
respectively. Its absolute value equals half the distance between the point 
and the sphere. 
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For two hyperplanes ni + Sie and n.2 + ^26, 

(ni + 5ie) ■ (ri2 + ^2e) = rii • n2. 

2 

For hyperplane n + 5e and sphere c — —e^ 

(n + Se) ■ (c — —e) = (n + Se) ■ c. 



( 2 . 6 ) 



( 2 . 7 ) 



It is positive, zero or negative if the vector from the hyperplane to the center 
c is along n, zero or along — n respectively. Its absolute value equals the 
distance between the center and the hyperplane. 



It is zero if the two spheres are perpendicular to each other. When the two 
spheres intersect, ( 2 . 8 ) equals cosine the angle of intersection multiplied by 
PlP 2 - 

The following are geometric interpretations of the outer product of si,S2, 
which are vectors of nonnegative square in 

1 . If Si A S2 = 0 , si, S2 represent the same the geometric object. If si A S2 yf 0 
but (si A 52)^ = 0 , then 

~ if Si represents a point or the point at infinity, it must be on the sphere 
or hyperplane represented by S2; 

— if si, S2 represent two spheres or a sphere and a hyperplane, they must 
be tangent to each other; 

— if si, S2 represent two hyperplanes, they must be parallel to each other. 
In all these cases we say the geometric objects represented by si and S2 are 
in contact. 

Let Si A S2 yf 0 . The blade represents the pencil of spheres and hyperplanes 
that contact both si and S2, together with the point of contact, in the sense 
that a point, the point at infinity, a sphere or a hyperplane represented by 
a vector s of nonnegative square is in contact with both si, S2 if and only if 
s A Si A S2 = 0 . The pencil is called a contact pencil. 

2 . If (si As2)^ < 0 , then si, S2 must represent two intersecting spheres or hyper- 
planes. The blade si A S2 represents the pencil ofaa spheres and hyperplanes 
that pass through the intersection of si and S2, together with the points of 
intersection. The pencil is called a concurrent pencil. 

3 . If (si A 52)^ > 0 , then si, S2 are separate from each other. si A S2 represents 
X, y which are two points or a point and the point at infinity, together with 
the pencil of spheres and hyperplanes with respect to which x, y are inversive. 
The pencil is called a Poncelet pencil. 

Let A = Si • S2 Si • e S2 • e. 




/' P2 X + |ci -Cap 

(ci - ^e) • (C2 - — e) = 



( 2 . 8 ) 
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— If si, S 2 represent a point and a sphere, the point is inside the sphere if 
A > 0, outside if A < 0. 

~ If si,S 2 represent two spheres, they are inclusive, i.e., one is inside the 
other, if A > 0; they are exclusive, i.e., any sphere is outside the other, 
if A < 0. 

Conformal geometry is the geometry on Mobius transformations. Mobius 
transformations of 7^" are orthogonal transformations of with zLId iden- 

tified, where Id denotes the identity transformation. Mobius transformations can 
be studied by means of spinors in Gn+i,i- 

2.2 Lie Spheres 

A Lie sphere of 7^" refers to an oriented sphere, or an oriented hyperplane, or a 
point, or the point at infinity. First let us discuss the orientations of hyperplanes 
and spheres. 

A hyperplane has two orientations. Let ai, . . . ,a„ be n affinely independent 
points of 7?.”, i.e., J„_i = 9(ai A • • • A a„) 0, where 

n 

9(ai A • • • A a„) = ^(— l)*+^ai A • • • A a^ A • • • A a„. (2.9) 

i=l 

denotes that a^ does not occur in the outer product. The n points generate 
a hyperplane of 7^", and Jn-i determines an orientation of the hyperplane. 
Alternatively, the vector 

n =(-!)”-' J-.i (2.10) 

is normal to the hyperplane, and satisfies (J„_i An)"" > 0. It can be used 
to indicate the same orientation. The two normal directions indicate the two 
orientations of the hyperplane. 

A sphere also has two orientations. Let ai, . . . ,a„+i be n -I- 1 affinely inde- 
pendent points of 7^”. Then J„ = 9(ai A • • • A a„+i) 0 and determines an 

orientation of the sphere. If > 0, the sphere is said to have positive orienta- 
tion; otherwise it is said to have negative orientation. 

Let a be a point on the sphere, then the blade J„_i = a • J„ determines 
an orientation of the hyperplane tangent to the sphere at a, called the induced 
orientation of the tangent hyperplane. The normal direction of the hyperplane 
with the induced orientation, called the induced radial direction of the sphere at 
a, is 

n=(-l)"-ij;ra. (2.11) 

n is in the direction of (— l)"“^a if and only if the sphere has positive orienta- 
tion. The two radial directions (inward and outward directions) indicate the two 
orientations of the sphere. For even dimensional spaces, the positive orientation 
of a sphere is inward, while for odd dimensional spaces it is outward. 

An oriented hyperplane and an oriented sphere are said to be in oriented 
contact if they are tangent to each other and at the point of tangency the normal 
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direction of the oriented hyperplane is the induced radial direction of the oriented 
sphere. Two oriented spheres are said to be in oriented contact if they are tangent 
to each other and at the point of tangency they have the same induced radial 
directions. 






Fig. 1. Oriented contact 



In the previous subsection, we have seen that a sphere or hyperplane can 
be represented by a Minkowski (n + l)-blade (or dually by a vector of positive 
square) in Two such blades (or vectors) represent the same sphere or 

hyperplane if and only if they differ by a nonzero scale. Since a blade Bn,i (or 
vector s) also represents an oriented vector space, we can use (or ±s) 

to represent the same sphere or hyperplane with different orientations, i.e., two 
Minkowski (n+ l)-blades (or two vectors of positive square) represent the same 
oriented sphere or hyperplane if and only if they differ by a positive scale. 

A better representation is provided by Lie (1872), where an oriented sphere 
or hyperplane is represented by a null vector, and two null vectors repre- 
sent the same oriented sphere or hyperplane if and only if they differ by a 
NONZERO scale. Lie’s construction can be described as follows. is embed- 
ded into 7^"+^’^ as a hyperspace. Let {e_ 2 , e_i, ei, . . . , e„} be an orthonormal 
basis of 7^”+^’^. An orthonormal basis of 7^"+^’^ is {e_ 2 ,e_i,ei, . . . ,e„,e„+i}, 
where = —1. Null vectors of 77."+^'^ are also null vectors of 77."+^’^. They 
are the set 

Afo = {xe 7^”+^’2|a;2 = 0,x- e„+i = 0}. (2.12) 

A vector in A/q represents a point or the point at infinity of 77.". Two such vectors 
represent the same geometric object if and only if they differ by a nonzero scale. 
Vector X represents the point at infinity if and only if x • e = 0. 
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In a sphere or hyperplane is represented by a vector s of positive 

square. The mapping 

s s = s + |s|e„+i (2-13) 

maps all such vectors from 7?,"+^’^ to the set 

AC = {x e 7^”+^’2|x2 = 0,x • e„+i yf 0}. (2.14) 

In particular, ±s are mapped to different null vectors. Let r be the transforma- 
tion of 72."'+^’^ which changes e„+i to — e„+i while keeping e_ 2 ,e_i,ei, . . . ,e„ 
invariant. Then (— s)' = — rs. A point or the point at infinity of 7?." represented 
by a null vector x is on the sphere or hyperplane represented by vector s of 
■^n+ 1,1 ■£ if X • s = 0, or equivalently, x • (rs) = 0. The equalities are 

invariant when s and rs are rescaled. 





Fig. 2. Lie’s construction 



Based on the mapping (2.13) and the division of null vectors into A/o,AC, 
the Lie model of Lie spheres can be defined by the following theorem: 

Theorem 2.3. [Lie] Let he a Minkowski 3-blade o/72."+^’^. Let e„+i be a 
unit vector in the blade, e, cq be null vectors orthogonal to Cn+i in the blade. 
Then any null vector s of 7?."^^’^ represents a Lie sphere in the sense that a 
point represented by a null vector x is on it if and only if x ■ s = 0. Two null 
vectors represent the same Lie sphere if and only if then differ by a nonzero 
scale. A null vector s represents the point at infinity if s ■ e = s ■ e„+i = 0, a 
point if s ■ e„+i = 0, s • e yf 0, an oriented hyperplane if s ■ e = 0, s ■ e„+i yf 0, 
and an oriented sphere otherwise. This algebraic representation of Lie spheres is 
called the Lie model. 

The following are standard representations of Lie spheres: 

1. The point at infinity is represented by e. 

2. Point c of 72." is represented by c. 
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3. The hyperplane with unit normal n and distance <5 > 0 away from the origin 
in the direction of n has two orientations. The oriented hyperplane with 
normal n is represented by n + <5e + e„+i; the oriented hyperplane with 
normal — n is represented by n + <5e — e„+i. 

4. The sphere with center c and radius p > 0 has two orientations. The oriented 

sphere with inward orientation is represented by c — —e + pe„+i; the one 

p^ 

with outward orientation is represented by c — —e — pcn+i- 

Let si, S 2 be null vectors of and let e, ei, £2 = ±1- The following are 

formulas on the inner product Si-S 2 - 

— If Si = Cl, S 2 = C 2 , then 



. . |cl-C2p 

Sl-S2= 2 • 


(2.15) 


If Si = c, S 2 = n -I- 5e -I- ee„+i, then 




Si • S 2 = c • n — 5. 


(2.16) 


If Si = Cl, S 2 = C 2 - —e + epCn+i, then 




, , p2-|ci-C2p 

Sl-S2= 2 


(2.17) 


If for i = 1,2, Si = THi + die + CiCn+i, then 




Si • S2 = £162111 -112 — 1. 


(2.18) 


p^ 

If Si = n -1- 5e -1- £ie„+i, S 2 = c — —e + £ 2 pe„+i, then 




Si • S2 = C • n — (5 — £i£2p. 


(2.19) 



- If for i = 1,2, Si = Ci - ye + etpiCn+i, then 



Si • S2 = 



(pi - ei£2P2)^ - |ci - C 2 I 



The oriented contact distance between two Lie spheres si,S 2 is defined by 
|si — S 2 I = a/2|si • S 2 I, where the null vectors take the forms in the above formu- 
las. Two Lie spheres are in oriented contact if and only if their oriented contact 
distance is zero. 



— When si, S 2 are both points, |si — S 2 I equals the distance between them. 

— When one is a point and the other is a hyperplane, |si — S 2 I equals \/2 times 
the distance between them. 
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— When one is a point and the other is a sphere, |si — S2I equals the distance 
between them. 

0 

— When both are hyperplanes, |si — S2I equals 2sin— , where 9 is the angle 
between vectors eirii,e2n2. 

— When one is a hyperplane and the other is a sphere, the set of signed dis- 
tances from the hyperplane to the points on the sphere has a unique maxi- 
mum and a unique minimum, denoted by dmax and dmin respectively. Let e 
be the sign of c • n — d when it is nonzero. 

• If c • n — d = 0, then |si — S2I equals the radius of the sphere. 

• If eei£2 = 1, then |si — S2I = -\/2|dmax|- In particular, when |si — S2I = 0, 
the hyperplane and the sphere are in oriented contact. 

• If ££162 = —1, then |si — S2I = -\/2|(imin|- In particular, when |si — S2I = 0, 
the hyperplane and the sphere are in oriented contact. 

— When both are spheres, then 

• if they have the same orientation and are not inclusive, |si — S2I equals 
the outer tangential distance between the two spheres, i.e., the distance 
between the two points of tangency in the common tangent hyperplane 
of which the spheres are on the same side; in particular, if |si — S2I = 0, 
the two spheres are inner tangent to each other; 

• if they have different orientations and are outer tangent to each other, 
|si - S 2 I = 0; 

• if they have different orientations and are exclusive, |si — S2I equals 
the inner tangential distance between the two spheres, i.e., the distance 
between the two points of tangency in the common tangent hyperplane 
of which the spheres are on different sides. 

2.3 Lie Sphere Geometry 

Lie sphere transformations are orthogonal transformations of with ±/d 

identified. Geometrically, Lie sphere transformations are transformations in the 
set of Lie spheres preserving oriented contact. Lie sphere geometry is the geom- 
etry on Lie sphere transformations. 

Laguerre transformations are Lie transformations keeping the 1-subspace 
spanned by e invariant. Geometrically, Laguerre transformations are Lie sphere 
transformations keeping the set of hyperplanes invariant. Laguerre geometry is 
the geometry on Laguerre transformations. 

Mobius transformations are Lie sphere transformations keeping vector e„+i 
invariant. 

From the definition of Lie sphere transformations, it is clear that spinors can 
play an important role in the study of Lie sphere transformations and Laguerre 
transformations. These are not to be discussed in this paper. 

3 Further Properties of the Lie Model 

In this section we further investigate basic properties of the Lie model for the 
purpose of applying it to Euclidean geometry. All the results hold for n dimen- 




124 Hongbo Li 



sions by obvious revisions. We let n = 2 here only to make the material more 
easily understood. 

The Lie model for the plane is in the space . Let I2 be a unit 2 -blade 
determining the orientation of Ti? . The orientation of is determined by the 
unit pseudoscalar = e A eg A l2 A 63. We have 

-^3,2 = ^3,2 = -^ 3 , 2 - ( 3 - 1 ) 



3.1 One Lie Circle 

Since n = 2 , the positive orientation of a circle is inward. Let s be a null vector 
in 7 ^ 3 ’ 2 . Then s • e s • 63 > 0 if s represents a positive circle; s • e s • 63 < 0 if it 
represents a negative circle. 

— Let Ci,C2,C3 be three non-collinear points. The oriented circle passing 
through them and whose orientation is from Ci to C2 to C3 can be repre- 
sented by 

s = (ci A 62 A 63)63 - |ci A 62 A C3|e3. ( 3 . 2 ) 

Notice the negative sign. One can verify that s • e > 0 if the orientation from 
Cl to C2 to C3 is positive. For n dimensions the sign is (— 1 )"“^. 

— Let Ci,C2 be two distinct points. The directed line passing through them 
and whose direction is from Ci to C2 can be represented by 

(e A 61 A C2)e'^ -I- |e A 61 A C2|e3 = (e A 61 A 62)63 -I- |ci - C2|e3. ( 3 . 3 ) 

— Let c be a point, a be a unit vector of TZ^. The directed line passing through 
c and with direction a can be represented by 

(e A 6 A a)e|(" -I- |e A 6 A a|e3 = (e A 6 A a)c3 -I- 63. ( 3 . 4 ) 

— The inward circle with center c and passing through point a can be repre- 
sented by 

a • (e A 6) — |(e A 6) • ajeg = a • (e A 6) — |c — ajeg. ( 3 . 5 ) 

3.2 Two Lie Circles 

Let si,S2 represent two Lie distinct circles. They represent the same circle or 
line with opposite orientations if and only if 

63 A Si A S2 = 0 . ( 3 . 6 ) 

The blade s)^ can represent the set of Lie circles that are in oriented contact 
with the Lie circle si in the sense that, a Lie circle s is in oriented contact with 
Si if and only if s A s)" = 0 . For two Lie circles, the blade (si A S2)"" = s~ V s^" 
represents the set of Lie circles that are in oriented contact with both Lie circle. 
For example, if we use to denote that the two sides of the symbol are 
equal up to a nonzero scale, then for an oriented circle or line s, the blade 
(s A (ts))'^ ~ (63 A s)'^ represents points on the circle, or points on the line and 
the point at infinity. 

The blade = (si A S2)"" has two possibilities: 
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1. If Si • S 2 = 0, then si and S 2 are in oriented contact, represents a parabolic 
pencil of Lie circles, i.e., the set of Lie circles that are in oriented contact with 
both Lie circles at the point or point at infinity where the two Lie circles are 
in oriented contact. It is a contact pencil of circles and lines together with 
the common point of contact, with the circles and lines assigned compatible 
orientations. 

2. If Si • S 2 yf 0, then T 3 is Minkowski. The set of common oriented contact Lie 
circles is topologically a circle. 

Let si, S 2 be two circles. They are inclusive if and only if(e 3 AsiAs 2)^<0 
and (e A Si A 52 )^ < 0; they are exclusive if and only if (63 A si A 52 )^ < 0 and 
(e A Si A S 2 )^ > 0. 

This can be proved as follows. Since (63 A si A 52 )^ = — (si A S 2 )^, the two 
circles are separate if and only if (63 A si A 32 )^ < 0. If they are exclusive, 
then they have four common tangent lines, which means that the two oriented 
circles and the point at infinity have two common oriented contact Lie circles. 
By Theorem 3.1 in the next subsection, this is equivalent to (e A si A 82 )^ > 0. 
If they are inclusive, they do not have any common tangent line, by the same 
theorem, (e A si A 52 )^ < 0. 



3.3 Three Lie Circles 

Let si,S 2 ,S 3 be three distinct Lie circles. si A S 2 A S 3 = 0 if and only if they 
belong to a parabolic pencil. Assume that T 3 = si A S 2 A S 3 yf 0. Consider the 
following problem: when do they have a common oriented contact Lie circle, and 
what kind of common oriented contact Lie circles do they have? 

Theorem 3.1. When > 0, = 0 or < 0, the number of common oriented 
contact Lie circles is 2, 1 or 0 respectively. 

Proof. Since si,S 2 ,S 3 are all null vectors, T 3 has only three possible signa- 
tures: (2, 1, 0), (1, 1, 1), (1, 2, 0). In the three cases, T| > 0, =0, <0 respec- 
tively. The corresponding 2-blade has the following signatures respectively: 
(1,1,0), (1,0,1), (2, 0,0). The number of null 1-subspaces in Tf', which equals 
the number of common oriented contact Lie circles, is 2, 1, 0 respectively. 

If si,S 2 ,S 3 are three points, there is a circle or line passing through them 
with two possible orientations. If they are three pairwise intersecting oriented 
lines, then besides the point at infinity, there exists another common oriented 
contact Lie circle, which is either the inscribed circle or an escribed circle of the 
triangle formed by the lines, depending on the orientations of the lines. 

Theorem 3.2. Let si, 52,53 be three oriented Lie circles having two common 
oriented contact Lie circles 54,55. 

1. 54,55 are both points if and only if the Lie circles belong to a concurrent 
pencil together with the points of intersection. 
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2. L/si,S2,S3 art all circles, then 54,35 are both lines if and only if 



Cl — C2 _ eipi — 62/32 ^2 

Cl — C3 eipi — esps 

3 . If at least one 0/51,52,53 is a circle or point, then 54,55 are two circles of 
different orientations if and only if e AT3 0 , 63 AT3 0, but (e A T3) ■ 
(63 A T3) = 0. 




Fig. 3. Theorem 3 . 2 , 1 and 2 




Fig. 4 . Theorem 3 . 2 , 3 



Proof. 1 . This can be obtained from the fact that a concurrent pencil of cir- 
cles and lines which is not a parabolic pencil must have two points as the 
intersection. 

2 . Let Si = Ci + CiPiCs for i = 1 , 2 , 3 . corresponds to two lines if and only if 
e A T3 = 0 . Expanding this equality, we get 

J e A Cl A C2 A 63 = 0 

\ e A 63 A {cipiC 2 A 63 -I- C2P2C3 A 61 -|- e3P3Ci A 62) = 0 
which can be written as ( 3 . 7 ). 
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3 . Let i?2 = S4 A S5. Then 54 • S5 7^ 0 . If at least one of si, S2, S3 is a circle or 
point, then neither 54 nor S5 is collinear with e, i. e., e • S4 and 63 • S4 cannot 
be both zero, and the same is true for s'5. 

So B2 corresponds to two points if and only if 63 • S4 = 63 • S5 = 0 , and 
corresponds to two lines if and only if e-S4 = e-s^ = 0 . When B2 corresponds 
to neither two points nor two lines, then 63 • S4 and 63 • S5 cannot be both 
zero, and the same is true when 63 is replaced by e. In this case, from 

{e AT3) ■ {es AT3) = (e-i?2) • (e3-i?2) = -S4-S5(e-S4 e3-S5 + e-ss 63-54) ( 3 . 8 ) 

we get that ( 3 . 8 ) equals zero if and only if e • 34/63 • S4 = —e ■ s^je^ ■ S5, 
which is neither zero nor infinity. So S4, S5 must represent two circles with 
different orientations. 

When si,S2,S3 have a unique common oriented contact Lie circle, then at 
least one of the Si ■ Sj, 1 < i < j < 3 , equals zero, but not all of them are zero. 
Assume that si • S2 7^ 0 . Let 

f = (si A S2) • (.si A S2 A S3). ( 3 . 9 ) 

Then t is a null vector representing the common oriented contact Lie circle. 

The unique eommon oriented contact Lie circle of three Lie circles s\, S2, S3 
is a point if and only if 



(63 A Si A Sj) ■ (si A S 2 A S 3 ) = 0 (3.10) 

for any 1 < i < j < it is a line if and only if 

(e A Si A Sj) ■ (si A S 2 A S 3 ) = 0. (3.11) 



3.4 Four Lie Circles 

If four Lie circles Si, i = 1 , ... ,4 have a common contact Lie circle, the vector 
X = (si A S2 A S3 A 54)^^ is either zero or null. In both cases we have 

(si A S2 A S3 A 34)^ = 0 . ( 3 . 12 ) 

Conversely, if x yf 0 , it must represent the unique common oriented contact 
Lie circle. If x = 0 , then if Si Asj Ask = 0 for any 1 <*</<A:< 3 , the four Lie 
circles belong to a parabolic pencil, and have infinitely many common oriented 
contact Lie circles; if si A S2 A S3 yf 0 , then any Lie circle that is in common 
oriented contact with si, S2, S3 must be in oriented contact with S4. 

When the Si are points, ( 3 . 12 ) can be written as 

(si A S2 A S3 A S4) — det(sj * Sj)j^j~i__4 — Y^det(|cj Cj| )2^j— 1..4 — 0 - 



After factorization, we get 
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Theorem 3.3. [Ptolemy Theorem] If four points Ci, i = are on the 

same circle, then 

<^12'^34 i ^14*^23 i <^13<^24 = 0, (3.13) 

where dij is the distance between point Ci and point Cj . 

When the Si are circles, (3.12) can be written as 
(si A S 2 A S 3 A 34 )^ = det(si • Sj)ij=i,A = ^ det(|si - Sj\'^)ij=i,A = 0. 
After factorization, we get 

Theorem 3.4. [Casey Theorem] If four circles Ci, i = 1, ... ,4 are tangent to 
the same circle, then 



Ti 2 T 34 ± T 14 T 23 ± Ti3T24 = 0, (3.14) 



where Tij is the tangential distance between circle Ci and circle Cj . 

When the Si are lines, (3.12) is always true because the point at infinity is 
on every line. There exists another common contact Lie circle if and only if 

Si A S 2 A S 3 A S 4 = 0, (3.15) 

and either the four lines pass through a common point, or at least three of them 
have a common oriented contact Lie circle other than the point at infinity. 

4 Illustrative Examples 

Example 1. Let ABC be a triangle in the plane. Let a, b, c be unit vectors along 
sides AB, BC, CA respectively, and let \AB\ = 1. Represent the inscribed circle 
of the triangle with A, I, a, b, c. 

Below we use four different Clifford algebraic models to solve this problem. 
Approach 1. The Clifford model f/ 2 - 

Let I be the center of the inscribed circle. Line I A bisects IB AC, and line 
IB bisects LABC. In the language of vectors, vector J — A is parallel to vector 
c — a, and vector I — B is parallel to vector a — b. These constraints can be 
represented by 



J (/ — A) A (c — a) =0 
I (/ - R) A (a - b) = 0 ■ 

From B — A = la, we get I — B = I — A+A — B = I— A — la. The second 
equation can be written as 



(/ — A) A (a — b) = —la A b. 
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A 




So 



I- A=-l- 



a A b 



-(c-a). 



” (c — a) A (a — b) 

The radius p of the circle equals the distance from the center to line AB-. 

la A bile A a 



(4.1) 



p=\P^{I-A)\ = \i,A{I-A)\=l- 



(c — a) A (a — b)| ' 



(4.2) 



Approach 2. The Grassmann model Q 3 . 

In this model, the plane is embedded in TZ^ as an affine plane away from the 
origin. Since vector I — Ais parallel to vector c — a, and vector I — B is parallel 
to vector a — b, line I A can be represented by Al A (c — a), and line IB can be 
represented by B A (a — b) . The intersection of the two lines is 



/ ~ (B A (a — b)) V (a 1 A (c 
= {AA B A {a — b))~(c — 

= (B A (c — a) A (a — b))" 



-a)) 

a) + (i? A (c — a) A (a — b))~A 
. / {AA{B-A)A{a-h)y 

\ (BA(c-a) A(a-b))" 



(c 




SoI-A = -l- 



a A b 



-(c-a). 



(c — a) A (a — b) 

The radius equals the distance from I to line AB: 



p = |7 A A A a| 



|aA (/- a1)| = I 



|a A b||c A a| 
l(c-a) A(a-b)|' 



Approach 3. The homogeneous model Gsp- 

Similar to the Grassmann model, line I A can be represented by eAAlA(c— a), 
and line IB can be represented by e A i? A (a — b). The intersection of the two 
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lines is 

eA/~(eAi 3 A(a — b)) V (e A A A (c — a)) 

= (eAAAi?A(a — b))~e A (c — a) 

+ (e A i? A (c — a) A (a — b))""e A A 
= (e A B A (c — a) A (a — b))"" e A 

(e A i A (g - i) A (a - b))"^ _ 

y (e A B A (c — a) A (a — b))~ 



SoI-A = -l- 



a A b 



-(c-a). 



(c — a) A (a — b) 

The radius equals the distance from I to line AB-. 



p= |e A / A A A a| = |a A (I — A)| 



|aAb||cAa| 
l(c-a) A (a-b)l' 



Approach 4 - The Lie model t/3,2- 

Directed lines AB, BC, CA are represented by null vectors (eA2lAaAe3)""+e3, 
(e A i? A b A 63)"" + 63, (e A A A c A 63)"" + 63 respectively. The inscribed oriented 
circle corresponds to the null 1-subspace other than the one generated by e in 
the 2-blade 

i?2 = ((e A A A a A 63)"" -I- 63) A ((e A B A b A 63)"" -I- 63) 

A((e A A A c A 63)^ -|- 63))^ 

= (eAAAaAc3-|- e'^) V (e A B A b A 63 -I- ej') V (e A A A c A 63 -I- Cg ) 

= e A 63(6 A 63 A A A a A c)""(e A 63 A i? A b A A)~ 

-l-e A (c — a)(e A 63 A B A b A A)"" 

— e A A(e Ae3AAA(aAb + bAc + cA a))"" 

= — (eAe 3 A 24 A(aAb-|-bAc-|-cA a))~ 

(eAesAAA(B-A)Abr 

y (e A 63 A A A (a A b -I- b A c -I- c A a))~ 

(e A 63 A A A c A a)""(e A 63 A A A {B — A) A b)"" 
(eAe 3 A 24 A(aAb-|-bAc-fcA a))~ 




So the center of the circle is / = A — I — 

(c 



a A b 

a) A (a 




a), the radius is 



|aAb||cAa| 

|(c-a)A(a-b)|’ 

A comparison of the four approaches shows that, the computation based on 
the Lie model is not necessarily the simplest, considering the additional three 
dimensions it requires. However, in the Lie model it is the original definition of 
the inscribed circle of a triangle that is used in algebraic description, the center 
and the radius are directly computed at the same time, instead of the center 
being computed first and being used to compute the radius. The Lie model 
behaves more “dummy-proof’ in algebraic description and computation. 
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Example 2. Let there be a convex polyhedron of five faces in the space. Find 
the condition for the existence of an inscribed sphere of the polyhedron. 





Fig. 6. Example 2 



Let the unit outer normals of the five faces be tii, i = 1, ... ,5 respectively. 
Let A be the intersection of the three faces with normals rii, n. 2 , ns, and let 84 ^, 
85 be the distances from A to the faces with normals 114 , ns, respectively. 

Choose A to be the origin of the space. The five faces can be represented by 



51 = ni + 64 

52 = n2 + 64 
< S3 = ns + 64 

54 = n4 -t“ 84B 64 

55 = ns + (5s6 + 64 



For a convex 5-faced polyhedron, the five faces do not possess a common 
point and at least four of them have a common tangent outward sphere. So the 
existence of an inscribed sphere is equivalent to 

Si A S2 A S3 A S4 A Ss 

= 64 A 6 A (< 54 (n 2 A ns A ns — ni A ns A ns + ni A n2 A ns — ni A n2 A ns) 

— 5 s(n 2 A n3 A n4 — ni A n3 A n4 -I- ni A n2 A n4 — ni A n2 A ns)) 

= 0 , 

i.e.. 
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9(ni A n2 A ns A n4) 
S 5 d(ni A n2 A ns A ns) ' 



(4.3) 



A 5-faced convex polyhedron has an inseribed sphere if and only if for any 
vertex A, the relation (4-3) holds. The right-hand side of (4.3) equals the ratio 
of the signed volumes of two tetrahedra whose vertices are respectively points 
ni,n 2 ,n 3 ,n 4 and points ni,n 2 ,n 3 ,ns on the unit sphere of the space. In par- 
ticular, if for some vertex A, S 4 = 0, then (4.3) becomes 



i.e., the four points ni, n 2 , n 3 , n 4 are on an affine plane of the space. 

5 Conclusion 

The Lie model is principally for geometric problems involving oriented contact 
of spheres and hyperplanes. This model cannot deal with problems on conformal 
properties without resorting to the homogeneous model. It cannot represent 
lower dimensional spheres and planes. It can serve as a supplementary tool for 
the homogeneous model in classical geometry. 
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Abstract. The structure of hypersurfaces corresponding to different 
spatio-temporal patterns is considered, and in particular representations 
based on geometrical invariants, such as the Riemann and Einstein ten- 
sors and the scalar curvature are analyzed. The spatio-temporal patterns 
result from translations. Lie-group transformations, accelerated and dis- 
continuous motions and modulations. Novel methods are obtained for 
the computation of motion parameters and the optical flow. Moreover, 
results obtained for accelerated and discontinuous motions are useful for 
the detection of motion boundaries. 

Keywords: dynamic features, motion, flow field, differential geometry, 
curvature tensor. Lie transformation groups. 



1 Introduction 

The input to the human and most technical vision systems is light intensity / 
as a function of space and time. This function defines a hypersurface 

S = {x,y,t,f{x,y,t)} (1) 

which has the form of a 3-dimensional Monge patch. From a geometric point 
of view the curvature is the most important property of the surface in that it 
determines the intrinsic structure of the manifold [10], so it is of interest to inves- 
tigate how different types of visual inputs are represented by the curvature tensor 
of (1). Further, two other geometric invariants, namely the scalar curvature and 
the Einstein tensor, will be also considered. The goal is, to gain a better un- 
derstanding of multidimensional signals and visual processing. In vision-science 
terms, nonlinear representations of dynamic visual inputs are considered. Such 
representations are generic but of interest to the perception-action cycle. For 
example, the points on (1) with significant curvature can track moving patterns 
and the curvature tensor can be used to compute motion parameters [6,3,5]. In 
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this paper, however, we consider the theoretical aspects only. Applications have 
been presented elsewhere, including models of biological visual processing [6,3,5]. 

Geometric methods in computer vision most often deal with the extrinsic 
geometry of objects in 3D space and how these objects and their motions project 
on the image plane. However, the geometry of the hypersurface (1) has been 
used for motion detection [8] with an algorithm based on the gradient of (1). It 
has also been shown that the Gaussian curvature of (1) can be used to detect 
motion discontinuities [14]. Our approach is related to the so-called structure- 
tensor method - see [7] for a review - and this relationship will be discussed in a 
forthcoming paper [4[. 

2 Translation with Constant Velocity 

If the image sequence f{x, y, t) results from any spatial pattern moving with 
constant velocity v = {vxTVy}, f is assumed to satisfy the constraint [2] 

f{x,y,t) = f{x + dx,y + dy,t + dt), (2) 



that leads to [2] 



dt 



V/-^, 



with V/ being the spatial gradient of /. Finally the solution of (3) is 



f{x, y, t) = f{x - Vxt, y - Vyt), 



( 3 ) 

( 4 ) 



showing that the image can be thought of as a “solitary wave” which moves, 
without distortion, with constant velocity along a straight line and whose shape 
is determined at any given time t by 



2.1 Riemann Curvature Tensor 

In this section we first summarize results that have been obtained previously [6,3] 
and that will be compared to the results in the following sections. 

If we compute the components of the curvature tensor (see Eq. 31 in the 
Appendix) for the specific function / in Eq. 4, and then simplify all possible 
ratios of components, we obtain the following results: ^ 

= {i?3221, — 7?3i2l}/i?2121 

^^2 = {7?3231) — ^313i}/-R3121 (5) 

I’S = {T?3232, — 7?323i}/-R3221- 

Here indices simply denote the fact that we obtain different expressions for v. All 
representations Vi were obtained by assuming the constant brightness constraint. 

^ These and the following simplifications have been performed with the aid of the 
software Mathematica [13]. 
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Note that Vi is the classical solution obtained for the optical flow under the 
assumption of constant spatial gradient [ 12 ] (this is not surprising since this 
assumption is more general and includes the constraint in Eq. 4). 

From Eqs. 5 we can obtain a further motion vector = {u 4 a;,t; 4 y} with 

V4x = sign(uia;)\/ R3232I R 2121 ,V 4 y = sign(wiy)-\/ i?313l/i?2121- (6) 

It seems an interesting result that the sectional curvatures (cf. Eq. 31) de- 
termine the direction of motion (but for the sign which is here taken from the 
vector Vi). 

To summarize, we found four different combinations of R components that 
are equal and equal to the motion vector in case that Eq. 4 holds {v = Vi = 
V2 = V3 = V4). ^ We shall see in later sections, how these expressions might 
differ for patterns other than (4). 



2.2 Einstein Tensor 

As for the curvature tensor, we can obtain four expressions for the motion vector 
by simplifying the components of the Einstein tensor G that is obtained from 
the Riemann tensor through a contraction of the indices (see [11] for definition 
and properties): 



Vl = {Gii, G2 i}/G31 

= {G 21 , G22}/G32 (7) 

V 3 = {G 31 , G32}/G33. 

The expressions for the components of G contain first and second order deriva- 
tives. Unfortunately, these expressions are too large to be printed here but are 
available on this paper’s website [ 1 ]. 

As in Section 2.1 we can obtain a further motion vector from the relation 
= {Gll, G22}/G33. 

2.3 Scalar Curvature 

So far we have considered tensor-based representations of spatio-temporal pat- 
terns. It can be useful, however, to consider also scalar quantities that can be 
derived from S. The scalar curvature G is a contraction of R [10,11]. Under the 
constraint (4) G simplifies to 

^ ^ 2 (1 + u ■ u) - /xC^) 

(l + V/- V/ + (V/-u)')' 

^ Note that if we simplify the indices in Eqs. 5 and 6, i.e., we just set 3221/2121 = 
3/1, 3121/2121 = 3/2, . . . , we obtain {3/1, 3/2} for the first three vectors and 
{33/11,33/22} for {nL,n|^}. 
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with / being a function of (XiC) where x = ^ ~ i = U ~ Vyt, and V/ = 
{/x;/?}- The dot denotes the scalar product and indices in and 
denote first- and second order partial derivatives respectively. Note that for zero 
velocity, the 3D scalar curvature is just the 2D Gaussian curvature in (x,y), as 
should be expected. 

3 Lie Transformation Groups 

So far we have considered spatio-temporal patterns that arise from a translation, 
however, spatio-temporal patterns can result from a variety of transformations. 
To investigate how the constant brightness constraint is modified in this case we 
shall make use of the theory of Lie transformation groups [9] . 

If the image is transformed by the action of a linear one-parameter Lie trans- 
formation group, whose infinitesimal operator X\ = ai{x, y)d / dx+a 2 {x, y)d/dy, 
A being the parameter of the transformation, then the fundamental flow con- 
straint can be written as 



= f{r',t + dt), 



( 9 ) 



where r = {x, y} and r' = r + dr. The transformation r ^ r' results in 

dx = x' — x = oi(x, y)d\ dy = y' — y = a 2 {x, y)dX. (10) 



A straightforward application (omitted here for brevity) of Lie group theory 
shows that Eq. 9 leads to 



dt 









( 11 ) 



a{x, y) = {oi(x, y), a 2 {x, y)}^ and here A has been considered a function of t, as 
it must be in case of motion. 

If several transformation groups are considered, with differential opera- 
tors X\. then Eq. 11 becomes 



Suppose dXj/dt = Vj to be constant and that aji = aji{y), aj 2 = aji{x), then 
the solution of Eq. 12 is 



f{x,y,t) = f 






i{y)’^jt,y-'^aj2{x)iyjt 



(13) 



For instance consider the general rigid motion in 2D, that is given by two trans- 
lations along the coordinate axis and by a rotation; in this case ai = {1, 0}, z/i = 
vOx, 0,2 = {0, 1}, V 2 = voy, where O is the center of rotation. Then the velocity 
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of the center of rotation is just vq = vidi + V 20-2 = {fOx, fOy} that can also be 
obtained by usual kinematics. The rotation around O is given by = {—y,x\, 
V 3 = oj, where w is the angular velocity. Suppose vq and w constant; Eq. 13 
becomes 



y,t) = f{x- voxt + ujyt, y - voyt - uixt) , (14) 

where x, y are coordinates with respect to O. 

For this general case, however, it seems difficult to analyze the effect of such 
patterns on the spatio-temporal curvature without additional assumptions. 
From Eq. 14 a rotation constraint is defined by 

f{x,y,t) = f {x + ujyt,y - ujxt) . (15) 

As Eq. 15 itself, the results for this transformation can be obtained by simply 
setting vox = 0 and voy = 0 in Eq. 14 and in the equations obtained below for 
the transformation (14). 



3.1 Riemann Curvature Tensor 



For this type of input the vectors Vi differ, and they depend on x, y, t, vqx, voy,tu 
and the first and second order derivatives of /(X)C) with x = x — voxt + ujyt 
and ^ = y — voyt — ojxt. 

However, we obtain interesting results if we further assume that the gradient 
of / vanishes. In this case the components of R are: 



R2121 =D{l + t^uj^y 

R3131 = D[voy + tvoxi^ + xw - tyuj'^) 

R3232 = D[vox - tvoyio -yuj - tXLo'^) 

R3121 = n (1 -I- {-VOy - tVOxiO - XUJ + tyLo"^) 

R322I = —D (1 -I- ( — VOx + tVOyOJ + yui + txuj'^) 

R323I = -D [vOx - tVOyOJ - yuj - txw^) [vOy + tVOx^^ + XUJ - tyuj'^) 
with : 

^ ~ /x^ 



(16) 



and f as a, function of (y, ^) defined as above. It is straightforward to check that 
in this case all the vectors Vi (Eqs. 5 and 6) point in the same direction, which 
is the direction of the vector: 



{vox - tvoyW -yw - txLo‘^,-{voy + tvQxUJ + XLu - tyuj'^)} (17) 



3.2 Einstein Tensor 

For the Einstein tensor, again, we could not obtain useful simplifications but for 
the case of zero gradient. Surprisingly, the independent components of G are 
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equal to those of R in this case (but for the signs): 



G33 = — i?2121 
G22 = — R3131 
Gii = — R3232 
G32 = R3121 

G31 = — R3221 
G21 = R3231 



(18) 



3.3 Scalar Curvature 



For zero gradient the scalar curvature simplifies to 



G= - 2 D{l + t^uj^) 

(1 + VOx^ + VOy^ + 2vOyXUJ - 2vOxV0J + + t/^) (19) 



Note that for zero rotation and velocity, C is, in coordinates (x,^), the 2D 
Gaussian curvature (with zero gradient). 

4 Translation with Time-Dependent Velocity 

We now consider the more general case where the image shift contains higher- 
order terms, i.e., the motion can be accelerated, i.e.. 



4.1 Riemann Curvature Tensor 

With the constraint in Eq. 20, we still obtain for the curvature tensor 



but the other three expressions {R3231, -R3i3i}/R3i2i, {^3232,-^32311/^3221, 
and |i?3232, R3i3i}/R2i2i do not simplify to yield the velocity components. 

However, if we assume that the gradient of /(x,^) vanishes = 0), 

we obtain the following relations: 



fix, y, t) = fix -diit),y- d,2it)). 



( 20 ) 



{R322I, — R312 i}/R 2121 — {d'lit) , d, 2 it)} , 



( 21 ) 



{R323 I, — R313i}/R3121 = {d'lit) , d,2it)} 
{R 3232 ,—R 323 i}/R 3221 = {<^1 (^) , C^2 (^) } 
{R3232 , R313i}/R2121 = {d'lit)"^ , d'2it)“^} 



( 22 ) 



i.e., the motion vectors V2, U3, and U4 are obtained only for local extrema of 
fix ,0 (that are extrema of fix,y) also). 
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4.2 Einstein Tensor 

For the Einstein tensor under the constraint (20) we could not obtain any simpli- 
fications. However, under the additional constraint of zero gradient (see above) 
we obtain: 

{Gn,G2i}/G3i = K(t),4(t)} 

{G 21 , G22}/G32 = , d' 2 {t)} 

{G31,G32}/G33 = {d'i(i),4W} (23) 

{Gii,G 22}/G33 = K(^)^d'2W"} 



4.3 Scalar Curvature 

In case of the additional assumption of zero gradient (see above), the scalar 
curvature simplifies to: 

G = 2(1 + d^\tf + d 2 '(f)')(/««/xx - /x«") (24) 

with / being a function of (x, ^) where x = a: — di{t), ^ = y — d 2 {t). 

5 Discontinuous Motion 

In this section we consider different types of motion discontinuities and how 
they are represented by the curvature tensor. In particular, we will show that 
the expressions for the vectors Vi in Eqs. 5 and 6 differ. Therefore the differences 
can be used as indicators of discontinuous motions [3,4]. An exception are the 
locations where the gradient of / vanishes (local extrema) . 

5.1 Velocity Step 

We first consider the case where the velocity vector changes suddenly from zero 
to {vx,Vy}, i.e. the image-sequence intensity f{x,y,t) is defined by 

f{x, y, t) = fix - Vx7(t),y - Vy 7 (t)) (25) 

where 7 (f) is the unit step function. We obtain 

{.R 322 I, — .R312i}/.R2121 = {5{t)Vx, —6{t)Vy} (26) 

where S{t) is the Dirac-Delta distribution. 

Note that this vector is different from zero only at t = 0 when it points in 
the direction of the motion vector {vx,Vy}, i.e., —R 3121 /R 3221 = Vyjvx- This is 
not the case for the other three vectors in Eqs. 5 and 6 . For example, for V 2 we 
obtain: 

o / o + Vvd'{t)fdxx - Vy^ditffiifxx + 

^3131/^3231 s:(+\2( f -P ^ 

dityiVxVyf^J^^ - VxVyf^^ ) 



( 27 ) 
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with / as a function of (XjC) x = x — Vxj{t), ^ = y — Vyj{t). Similar 
but different expressions are obtained for R 3231 /R 3232 and R 3131 / R 3232 - For the 
extrema of /(X ;0 (assumption of zero gradient as above), however, all the four 
vectors (Eq. 5 and 6 ) point in the direction of {vx,Vy\. 



5.2 Onset of a Spatial Pattern 

Here we consider the case: 



f{x,y,t) f{x,y)j{t) 



(28) 



i.e., the spatial pattern f{x,y) is turned on at time t = 0 . 
We obtain the following results: 



{^3221, —^ 3121 } 
^2121 

{-R 323 I; ~-^313l} 
^3121 

{^3232, ~^323l} 
^3221 

{^3232, ^3131} 
^2121 



fyf^y fxfxy ' 
f 2 _ f f ^ f 2 _ 

Jxy JyyJxx Jxy 
r S{t)'^f^fx-f{x,y)‘y{t)S'{t)fa,y 

1 S(t){j{t)f^f:cy-'r(t)fyfxx) ’ 

r -S{tffy^+f{^,vh(t)S'it)fyy 

1 fyy fx+S(t)l(t) fy fxy ’ 

r -S{tffy'^+f{x,y)j{t)S'{t)fyy 

^ -l{t)fxy‘‘+l(t)fyyfxx ’ 



fyfxx 



' fyyfxx 

-(S{tffx‘^)+f{x:,y)'r{t)5'(t)fxx 

-{S{t)'j{t)fxfxy)+S{t)'l{t)fyfxx 

fy fx+ f {x ,y)^(t)5' (t) fxy 
-S(t)'rit)fyyfx+S{t)'l{t)fyfxy 

-Sit)^fx^+f{x,y)-y{t)S'(t)fxx 1 

-l{t) fxy^ +l{t)fyy fxx ^ 



} 

} 



(29) 



Note that the four expressions, which are equal for translations, differ for this 
specific dynamic pattern. For this type of input (Eq. 28) it is interesting to look 
at the components of R for the case of zero spatial gradient. We obtain the 
following results: 



.R2I2I — fxy + fyyfxx)/N 

R3131 = {f{x,yh{t)S'{t)fxx)/N 
R3232 = if{x,y)j{t)S'{t)fyy)/N 
R3121 = 0 
.R3221 = 0 

.R3231 = {fix,y)-f{t)S'(t)fxy)/N 
with: 

N =l + 6 {tff{x,yf 



(30) 



Note that two of the components are zero, such the the vector V\ is zero and 
the vectors V 2 and V 3 are undefined due to a zero denominator. 

By substituting 5 for 7 , 5' for 5, and 5" for 5' we obtain the results for 
flashing pattern, i.e., f{x,y,t) f{x,y)S{t). The above results are a special 

case of modulation, i.e. f{x,y,t) f{x,y)a{t) with a{t) = j{t) and a'{t) = S{t). 
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6 Discussion 

Differential geometry provides powerful tools for analyzing the geometric struc- 
ture of multidimensional manifolds. With these tools it is possible to construct 
invariants that capture the structure of the manifold [10,11]. We have consid- 
ered the visual input as a manifold with a specific metric that is defined by 
image intensity f{x,y,t) (it is the metric of the hypersurface in Eq. 1), and we 
have looked at the curvature tensor R of that manifold as the most prominent 
geometric invariant and at two specific contractions of R (we had also looked 
at the Ricci tensor but had not obtained any meaningful result). In particular, 
we have shown how selected constraints on /, that are related to motion, affect 
these geometric invariants. By doing so we have found novel methods for the 
computation of motion parameters. 

Thus, the reported results show that relevant information about spatio- 
temporal patterns can be gained by analyzing the above-mentioned curvature 
measures. We have first considered translations and have obtained new expres- 
sions for the flow fields in terms of the components of the Einstein tensor. We 
have then generalized the usual constraint Eqs. 2 and 3 to the more general 
case of transformations that form Lie transformation groups. For these transfor- 
mations we have shown how the transformations giving rise to spatio-temporal 
patterns are encoded by the curvature measures. Meaningful results, however, 
have been obtained only for zero gradient, i.e. the local extrema of /(x, (that 
are extrema of /(x, y) as well) with coordinates (y, ^) depending on the transfor- 
mation. Finally, we have also considered discontinuous motions that have been 
described by step functions and Dirac-Delta distributions. These functions have 
been analyzed analytically as global patterns, but, of course, the scope is to de- 
tect local discontinuities and motion boundaries. In practical applications, the 
size of the local neighborhood will be determined by the filters used to compute 
the derivatives, and these filters can be implemented on multiple scales. 

Methods based on the four motion vectors derived from R, i.e. Eqs. 5 and 6, 
have already been applied, both to obtain robust motion estimations and to 
model biological motion sensitivity [6,3,5]. The authors had assumed that the 
four motion vectors will differ in case of discontinuous motions and have used 
these differences as indicators of occlusions and noise. Here we have shown that 
the vectors do indeed differ for such patterns. It seems an important result 
that the vector V\ still yields the correct motion in case of accelerated motions 
(Eq. 21) but the other three vectors do not (except for zero gradient). For discon- 
tinuous motions the vector V\ again plays a distinct role and thus supports the 
idea of confidence measures based on the differences among the vectors Vi. The 
question of how many vectors to use in which combinations still needs further 
investigation, but applications show that the use of all four vectors improves the 
results compared to using only two or three vectors. 

In conclusion, we have shown that the intrinsic geometry of spatio-temporal 
patterns, generated by specific transformations, provides useful information on 
the parameters of the transformations and new insights for the coding of motion 
and dynamic features. 
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A Components of the Riemann Curvatnre Tensor 

i?2121 = {fyyU. ~ /./)/(! + V/) 

i?3131 = (/tjxx - + V/) 

i?3232 = ifttfyy ~ fyt^)/{l + V/) 

i?3121 = {fytf.. - f.tf.y)/{l + V/) 

R 322 I = {fytfxy- fyyfxt)/{^ + '^ f) 

R 323 I = {fttfxy — /xt/yt)/(l + V/) 

with: 

1 J- V/ = 1 + + // + ft^ 
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Abstract. In this paper we propose a new type of neuron developed 
in the framework of Clifford algebra. It is shown how this novel Clifford 
neuron covers complex and quaternionic neurons and that it can com- 
pute orthogonal transformations very efficiently. The introduced frame- 
work can also be used for neural computation of non-linear geometric 
transformations which makes it very promising for applications. As an 
example we develop a Clifford neuron that computes the cross-ratio via 
the corresponding Mbbius transformation. Experimental results for the 
proposed novel neural models are reported. 



1 Introduction 

The aim of neural computation is to understand neural and cognitive processes 
in biological systems, in particular in the human brain, in computational terms 
and to design and analyze models and algorithms towards these goals. The con- 
nectionist approaches among these are known as Neural Networks (NNs). 
Neural networks are interconnections of artificial ’’neurons” that are greatly 
simplified versions of biological neurons [14]. 

The first model of an artificial neuron was proposed already more than 50 years 
ago by McCulloch and Pitts in 1943 [10]. However, the concept reaches wide 
public in the Artifical Intelligence community not before the end of the 80s, 
when suitable learning algorithms for training of multi-layer perceptrons (MLPs) 
were (re-) discovered in [13]. Nowadays artificial neural networks are very com- 
mon and used in many application fields ranging from classical ones like pattern 
recognition or computer linguistics to modern ones like financial forecasting, 
medical diagnosis systems and energy management systems. Therefore, some 
kind of paradigm shift from symbolic approaches (programming) to neural net- 
works (learning) could be noticed. A change that has also been pushed by many 
strong mathematical properties of neural networks that could be proven, e.g. the 
universal approximation property of sigmoidal MLPs [4]. 

On the other hand, NNs (like any other existing approach) are still far away 
from brains in any aspect. Besides that distance to the ultimate goal, there are 
computational problems too. One is the famous bias/variance trade-off formu- 
lated exactly in terms of statistical learning theory in [6], which arises from the 
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fact that NNs can only learn (in practice) from a finite set of examples. Roughly 
spoken, the only solution is to integrate a priori knowledge. Actually, many other 
problems that one faces in neural computation reduce in a broader sense to this 
type of problem. 

For example, it is impossible to predict whether a real valued MLP or a complex 
valued MLP will better perform a task of learning a complex valued function 
[9]. Both networks are universal approximators. The therefore required high 
non-linearities can create networks with high ’’variance”, which also makes rule 
extraction from such networks difficult. In that kind of monolithic structured 
networks the integration of domain or task specific knowledge (’’bias”) is not 
easy to do. 

One solution to overcome that ” black-box” behavior of NNs is the design of small 
specific subnetworks that perform simpler tasks and build up more complicated 
networks from such a functional base. In terms of supervised learning this so 
called ’’model-based” approach [3] also has the advantage that optimization 
is simplified due to constrained lower dimensional weight spaces. In a recent 
paper [5] the approach is extended to the neural computation of algebraic and 
geometric structures like eigenvalues of matrices or surface representations. 

In this paper we will follow similar intentions, but will focus exclusively on neural 
computation of geometric transformations. We will do this because of the funda- 
mental importance of geometric transformations in all aspects of the perception- 
action cycle related to both pattern generation and recognition. Thus, by follow- 
ing Felix Klein’s Erlangen program, we will demonstrate how to design NNs as 
building blocks of geometric expert knowledge by algebraic constraining neural 
computation. Linear transformations of real vector spaces can be computed eas- 
ily by a Single Layer Neural Network (SL-NN). This will be our starting point in 
the next section. There we will also see, that SL-NNs have some drawbacks. The 
first and fundamental one is obviously the limitation to real linear transforma- 
tions, which excludes many interesting transformation like projective or Mobius 
transformations. The second one results from the fact that only certain linear 
transformations are needed in most applications, like affine transformations in 
object recognition or rigid motions in robotics. Both drawbacks result from the 
fact that no additional geometrical a priori knowledge can be integrated into 
SL-NNs. 

Therefore, we propose instead a new type of neuron — the Clifford Neuron, which 
is formulated in terms of Clifford algebra. Since a Clifford algebra is constructed 
from a real vector space with a quadratic form on it, concepts of metric geom- 
etry can be used easily. It is shown that Clifford neurons compute orthogonal 
transformations very efficiently. Moreover, they can be used to compute non- 
linear geometric transformations which allows many interesting applications. As 
an example, we develop a Clifford neuron that computes the cross-ratio via the 
corresponding Mobius transformation. Experimental results to confirm the pro- 
posed novel neural models are presented. 
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2 Single Layer Neural Networks 

A formal neuron is a computational unit of the form shown in Figure 1. First, 




Fig. 1. Formal neuron 



a propagation function / associates the input vector x with the weight vector 
w, which comprises the learnable parameters of the neuron. Then, an activation 
function g is applied. Using the identity as activation function and the scalar 
product for weight association gives a linear neuron 

n 

y = +0 =< (w, 6), (x, 1) > . (1) 

i=i 



The goal of learning is to minimize a given error function on a training set 
T := {{x^,t ^), . . . , | a;* G M", U G M}. Minimization is done by using 

gradient descent resulting in the update rule 



^ i=\ j=l 



( 2 ) 



for the common sum-of-squared error (SSE). Composing linear neurons as shown 
in Figure 2 gives a single layer NN (SL-NN). 




Fig. 2. Single Layer Neural Network (SL-NN) 






Learning Geometric Transformations with Clifford Neurons 



147 



A SL-NN computes y = Wx, whereby the matrix W is composed of the single 
weight vectors. In particular, if W is square a general linear transformation 
is computed. In addition to that fundamental limitation, SL-NNs have other 
drawbacks too. 

Suppose, we want to learn an unknown transformation of an object like the one 
shown in Figure 3. 




Thus, the input consists of 6 points in space. We may also have additional 
knowledge about corresponding points in transformed output views. However, 
the only thing we can put in a SL-NN is one big 18-dimensional input vector. 
Therefore, also the transformation is searched by the SL-NN in 18-dimensional 
space. Reflecting the above, we can come up with the following observations. 
SL-NN fail to incorporate any geometrical a priori knowledge, since there is no 
representation of geometric entities of objects (e.g. points and lines) available. 
Actually, learning a geometric transformation means to learn the action of the 
corresponding transformation group on geometric entities. 

In the next section we will introduce the framework of Clifford algebra to develop 
neurons that are designed in such a way and will allow powerful integration of 
geometrical a priori knowledge. 

3 Clifford Algebra and Clifford Neurons 

The drawbacks of SL-NN in computing geometric transformations seen before 
come from the underlying concept. The structural concept of a vector space is 
too weak for such purposes. An additional structure of a vector space is that 
implied by a quadratic form Q on it. From this an algebra with the desired 
properties can be constructed in the following way. 

For all u = (ui, . . . , u„) G IR" there exists a basis of IR" such that 



Q{v) = -vl - vl 



(3) 
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with p, g G ]N and p + q = n. Thus, Q is already determined by (p, q) and so 
is a scalar product on M”. Together we call this the quadratic space IR*’’®. In 
particular, we obtain from (3) for an orthonormal basis {ei, . . . , e„} 

-Q(ei) = +1 if i<p (4) 

-Q(ei) = -1 if i>p. (5) 

This allows the following definition of a Clifford algebra [8]. 

Definition 1 An associative algebra with unity 1 over containing 
and IR as distinct subspaces is called the Clifford algebra Cp^q ofSRP’'^ , iff 

(a) V = —Q{v) , u G IR^^’® 

(b) Cp^q is generated as an algebra by IR^’® 

(c) Cp^q is not generated by any proper subspace o/lR*’’® . 

Condition (a) implies for alH, j G {1, . . . ,p + q} 

e* Op,, ej = -Cj Op,, Ci . (6) 

Together with (6) and (c) follows that the dimension of Cp^q is 2*’+® [12] and Cp^q 
is commutative only if p + <7 <= 1. With the notations 

A := {{oi, Or} G T’({1, . . . , n}) | 1 < ui < . . . < < n} , (7) 

where "P({1, . . . , n}) denotes the power set and defining for all ^ G ^ 

• — ^ai • • • ^ap 1 (S) 

we obtain a basis {e^i | ^ G ^4} of Cp^q. Then the following involutions of Cp^q 
can be defined 

X = '^(-1)^^^ XACA (9) 

A£A 

(-1) 2 XA&A (10) 

A^A 

-X=Y.^-lf^^XAeA. (11) 

A^A 

After all these preliminaries we shall look at some Clifford algebra concretely. 
Any Clifford algebra is isomorphic to a matrix algebra. Table 1 gives an overview 
in low dimensions, containing the real numbers (Co,o), the complex numbers 
(Co,i) and the quaternions (Co, 2). In this cases we can directly define meaningful 
Clifford neurons via the algebra multiplication. Complex NNs are well known 
(see e.g. [7]), whereas the study of quaternionic NNs just started recently [1]. 
However, our focus here is on the neural computation of geometric transforma- 
tions. So far, we have only embedded quadratic spaces in Clifford algebras. Our 
next goal is to determine elements of Cp,g that act as geometric transformations. 
Therefore, we have to study transformation groups that act on a special subspace 
A1RJ’’« :=1R©IRP’« ofCp,g. 
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Table 1. Low dimensional Clifford algebras 

denotes direct product; R(x) denotes xxx matrices) 



Definition 2 The Clifford group Fp,q of a Clifford algebra Cp^g is defined by its 
action on as 

Tp,q := {■« C s (g)p,„ s = ±1 I Va; G AM*'’® : s (g)p,, x (8)p,<, s~^ G AM*'’®} . (12) 

This action is in fact an orthogonal transformation of _ 

Theorem 1 The map 'Tg '■ Tp,q — > 0{p, q+ 1)\ s ips is a group epimorphism, 
whereby ips denotes the function x s C)p,, x C)p,, 

Thus, tfs implemented as shown in Figure 4 gives a neuron with only one weight 
s that computes an orthogonal transformation according to Theorem 1. 




Fig. 4. Clifford neuron (CN) 



Let us briefly review the first non trivial case of quaternions Co, 2 - For a more 
detailed study the interested reader is refered to [2], which also discusses the 
complex case that reduces to one multiplication. 

Let qo denote the scalar part of a quaternion q and q its vector part, respectively. 
The update rule for the weight s of a quaternionic Clifford neuron is given by 

a Tp ™ 

As = -q— = - y'') d , (13) 

5 = V(sq(7o + (a • s)(7o, Sq <7 + (s • q)s + 2so(s x q) — s x q x s). For illustration 
purposes we made the following simulation with a quaternionic CN and a 3x3 
SL-NN. The given task was to learn the transformation consisting of rotation 
of -60° about the axis [0.5, VT5, 0.5] and translation about [0. 2,-0. 2, 0.3] from 
5 randomly chosen points with added normal noise of 20%. Figure 5 gives the 
obtained results from the trained networks of a test object. 
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Fig. 5. Expected output test data (left), output quaternionic CN (middle) and 
output SL-NN (right) 



Obviously, the SL-NN just learned the noise in the data which then resulted 
in a complete wrong pose of the test object. This means that no generalization 
was possible. On the other hand, the generalization performance of the quater- 
nionic CN was quiet satisfactorily. Figure 6 showing numerically training and 
generalization error by different noise levels confirms the drawn conclusions. 
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0.1 

m 0.08 

s 

0.06 
0.04 
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0 5 10 15 20 
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real ^ — 
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Fig. 6. Training errors (left) and generalization errors (right) by different noise 
levels 



In the next section we will see exemplarily how the introduced Clifford neurons 
can be used to compute non-linear geometric transformations. 



4 Computing the Cross-Ratio with Clifford Neurons 



The computation of invariants of geometric transformations is fundamental in 
many areas of pattern recognition and computer vision. In the latter, a particular 
interesting class is those of projective invariants. The basic projective invariant 
is the cross-ratio from which many further invariants can be derived [15]. 

The cross-ratio of 4 points z,q,r,s€:€ is given by 



[z, q, r, s] 



{z-q){r-s) 
{z - s)(r - q) ■ 



(14) 
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The cross-ratio depends on the order of the points. The given definition is very 
common, but other definitions are common too [11]. To our knowledge, there are 
no neural algorithms for the computation of cross-ratios proposed so far. 

In this section we will show how this attractive non-linear task can be computed 
by a single Clifford neuron. Thereby, we will use the fact that the cross-ratio 
is closely related to Mobius transformations. A Mobius (or linear-fractional) 
transformation is a conformal mapping of the extended complex plane of the 
form 

whereby a, b, c, d are complex constants such that ad — bc^ 0. The group of that 
transformations will be denoted by Af . A Mobius transformation is uniquely de- 
termined by 3 given points. Moreover, the cross-ratio is invariant under Mobius 
transformations and two given cross-ratios are related by a unique Mobius trans- 
formation vice versa. Thus, one obtains the following known theorem (see e.g. 
[ 11 ])- 

Theorem 2 The cross-ratio [z,q,r,s] is the image of z under that Mobius trans- 
formation that maps q to 0, r to 1 and s to oo, respectively. 

Applying the group isomorphisms M = 0(1, 2) = Fi ^2 [12] we can study Mobius 
transformations in the 8-dimensional Clifford algebra Ci, 2 - According to Table 
1, Cl , 2 is isomorphic to the matrix algebra C(2). First, we have to find an ap- 
propriate coding of complex numbers in Ci, 2 - This coding is given by [2] 

co(z) = {Re{z),^{l + zz)Jm{z),^{l- zz),Q,Q,Q,Q) (16) 

or in matrix notation by 

c»(.) = (] “I ) . (17) 

Finally, the Clifford neuron that computes the cross-ratio via the corresponding 
Mobius transformation has the form 



y = s (g)i,2 c(2() (g)i,2 s , (18) 

from which we get 

Co(j/) = a(^^' ) , (19) 

with A = |c 2 ; -I- and z' = {az -\- b)l{cz -\- d) [2]. 

Due to the limited space we have to omit the derivation of the learning algorithm 
here. In order to test our proposed Clifford neuron we made the following little 
experiment. 

The task was to compute the cross-ratio [z,q,r,s] with q = Q.2-\- 0.3 i,r = 0.4 — 
0.7 i, and s = 0.6 — 0.2 i. The numerical problem we had to face was of course 
the coding of oo to which the point s should be mapped. After some trials we 
have found out, that 10^® was a reasonable value with respect to both stability 
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of learning algorithm and accuracy of the results. However, we still had to use 
a very low learning rate rj = 0.00001 in order to guarantee stability. With this 
rate the convergence of learning in terms of epochs was only slow, as reported 
in Figure 7. 





Fig. 7. Convergence of learning: all epochs (left) and last epochs only (right) 



But, computational time was less than 10 seconds. The SSE dropped under 
0.00001 after 30000 epochs, which gave a weight vector corresponding to the 
transformation parameters in Table 2. The difference of the learned transfor- 
mation parameters to the real values was quiet acceptable. Tabel 3 gives as 
confirmation the results obtained for some test points. 



Parameter 


Value 


Learned parameters 


a 


0.2 -h 0.5i 


0.20019 -f O.SOOlOi 


b 


0.11-0.16i 


0.11001 - 0.15992i 


c 


-0.2-fi 


-0.20079 -f 0.99939 i 


d 


-0.08-0.64 i 


-0.07930 - 0.64007 i 



Table 2. Transformation parameters 



Z 


Value 


Clifford neuron output 


2-1-3 i 


0.3578-0.33571 


0.3577-0.3364 i 


4-7 i 


0.4838-0.3044 i 


0.4838-0.3051 i 


0.3-l-0.1i 


0.0077-0.6884 i 


-0.0082-t0.6884 i 



Table 3. Cross-Ratios of test points (rounded) 
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The achieved accuracy could still be improved by longer learning times. The 
Clifford neuron behaved very well in the experiment, so limitation is machine 
accuracy only. 

5 Conclusion 

In this paper we presented novel artificial neurons in the framework of Clif- 
ford algebra. We showed how they cover complex and quaternionic neurons and 
how they are related to orthogonal transformations. The potential power of 
the proposed Clifford neurons was illustrated by the example of computing the 
cross-ratio with a single Clifford neuron, which is a non-linear task. Future work 
will concentrate on enlarging the functional basis of Clifford neurons (projec- 
tive transformations) and then build up networks of it for applications with the 
emphasis on object recognition. 
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Abstract 

The main idea ot the paper is that fast algorithms, like FFT, can 
be mcide more efficient in the context of an algebra, rather than in the 
more singular tiualernioii or complex algebras structure. However, the 
complex algebra structure can then be recovered as projection from the 
larger algebra in which it is embedded. Namely, the 12-dimensional al- 
gebra (hurwitzioii algebra) having the basis elements associated with the 
integer Hurwitz quaternions is introduced. The computational aspects of 
the luu witzion arithmetic are considered. The overlapped fast algorithms 
of two-dimensional discrete Fourier transform of an RGB imago are also 
developed. 



Key word.s: quaternion algebra, hiuwitzion, FFT 



1 Introduction 

The basis for the well-kuoum “overlapped” oiic-dimeusioiial FFT [1] is the pos- 
sibility of obtaining additional computational advantages at the expense of the 
rethuidancy of reiireseiitation of complex basis functions with res|>ect to a real 
input signal x (n). Putting it more exactly, the possibility of i:onstructing over- 
lapped algorithm-s exists due to the presence of a non-trivial aiitomorpliism (the 
complex conjugation) of complex field C, acting irhmtically upon R, . 

Actually, let 

A'-l 

x{rn) = 'y' ic(n) cxp{27r<-^^}. m. = 0 ,iV — 1, A' = 2’”, a:(n) fe R- 

( 1 ) 

•I’lii.s work was performed wil.h financial siipporl. from (he Russian f'ouiidation for Basic 
Research (Grant OI)-()l-(K)fi(K)). 
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Let us form aji auxiliary sequence 

z(;n) = itT(2n) + ixi^ln + 1) = Xo(n) -h u:i(n). (2) 

f-i 

\ cc\ 

z(m) = y z(n) exp{27r2 r/i = () 

A' 

rt— 0 

Tlieii, “partial spectra” 

iV_i K-i 

- / \ ^ ^ IV. .'2nm^ ^ \ .2mn 

xo(m.) = 2^ ;ro(/v.)cxp{27r-i— 5 ^}, j-;, (r/() = > ^ aq (n) exp{27r-f } (.5) 

n—O ' ri—0 

can b(! found from tli(> ndations 

2;j;o(m,) = £(n>) + £(— m), 2i:ri (rn) = £(m) — £(— m). (4) 

The full spectrum recoiis true lion is realized by relation 

. ^ ■ r 2m. 

a;(m) = ;co(m) + a:i(m) exp(27r»^— ). (u) 

Note tliat the computer-aided transition to the complex-conjugate number does 
not result in additional arithmetic operations. 

Tilt! ma.joritj’’ of fast iilgorithins (of Coolej'-Tiick(?y type) of the discrefr? 
Fl)uri(;r transform (DPT) have th(> ct)mpl(!xity 

IP(IV) = AAnog2N + 0(lV), 

where the constant A characterizes a. j>articular scheme of the algorithm [5]. 
Thus, the complexity of the “ovelapped” FFT is 

W{N) = \xN\ogo_N-0{N). 

If the technique dismissed above is used for multidimensional DFTs (in par- 
ticular 2D transforms) then i:ertain difficulties may be encountered: the field C 
has “too few” automorphisms admitting repeated overlaps for each argument 
vdth the possibility of the subsequent separation of spectra. This reasoning 
leads to the necessity of iimnersion of the field R into algebraic structures pos- 
sessing a sullicieutly large number of trivially implemented automorphisms over 
R. 

In some papers (e.g. (7j; [8j, [6]) T used this approach to solve the prolv 
lem of DFT fast algorithms synthesis. The idea of this approai'h is as follow's: 
we explicitly construct the immersion of the complex field into the algebraic 
structures having sufficient number of trivially implemented antoinorphisins. 
Somewhat increased computational complexity of the main arithmetic opera- 
tions in this case can be compensated by the possibility- of the “overlapped” 
calculation of spectrum portions. This is guaranteed l.w the existence of the 
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above automorphisms. The interpretation of tlie obtained results of an auxil- 
iary discrete transform (ADT) with values in a chosen algebraic structure is a 
Irornornorphism. In a number of cases, the calculational complexity of such an 
interpretation is asymptotically negligible. 

In particular, the FFTs with the overlapped two-diiiieirsional llF’iZ’-spectrnm 
calciilalion in the foiir-dimensioual quateruioir algelrra are considered in 7], [8]. 

The tasks similar to those of DFT fast algorithiir complexity reduction (the 
task of simultaneous ‘Overlapped” calculation of (several) DFT-spectra for mul- 
tichannel signal) are solved using similar methods. 

.Actually, .r:o(n) and ;ri(n) can Ire treated as two independent sequem;es. 
Thus, using (8) and (1) and omitting (5) we can reconstruct the full spectrum. 

In order to calculate three complex spectra simultaneously the algebra of 
at least three dimensions is needed. Thus, to simultaneously calculate two- 
dinieusional speclruni of a real numbers array the 12-dimensional R-algebra 
with the sufficient number of trivially impleinenled aulomortliisuLS is needed. 
The algebra having the Irasis associated with the integer lliii'witz quaternions 
can be used for this purpose. 

2 Algebra of hurwitzions: definition and some 
properties 

According to [2], [3], 4] we shall give the following definition. 

Definition 1 The algebi'u of hurwitzioim is a twelve- dimc'nmomil R,-«/t/e6ra 
Hu 'mth the basis £ given by 

£ = {e, 1, j, k, w, w', w-^, w^, w, w', wh w^’} (6) 

and the inuUi.plien.tinn rules indueed by muU.iplieati.on rules for the elements of 
set £ (see Table 1 }. The elements of huritriizinn algebra are called the huv- 
witzioris. 
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Tn ot.lier words, tlie dement i (and elements j, k) “behaves ” as riuaternion i 
while it is not sueh. 

The proofs of Propositions 1-5 and Proposition 7 given below can be obtained 
by direct calculations. 

Proposition 1 The mapping Vl' ; Hu > H such that 

'h : c I — >1, 'll : i I — n, 'i> : j — >j, : k i — *k\ 



vf =3 wj 
w' = k“^wk 



5(— I -I- i -3-j + k), 
k{-l + i - j - k), 
|(-1 — i +j — k), 



^(-1 



j-k). 



w = I W1 I 
w = j“^wj 



^{-1 

i(— 1 -f i — j + k), 

5 (-l - i - j -\-k), w = k“'wk I — > |(-l +i+j- k), 
am be Hnairiy extended up to hoinornorphism of algebra Hu into (dgebru H. 

The homomorphism (projcxdion) of hurwitzion algcdu'a Hu into cpiaternion 
algebra H transforms the elements i,j,k into the “true” quaternions 

Proposition 2 The set 

H*u = {g:e-|-y(i ^ j -|- k), x. g e R} 

is a subnlgehra of the algebra Hu. 

Proposition 3 The hururitzion W € H*u ; 



... 27T 1 . 27T 

W = ecos- + ^(i-j + k)sm- 

is the Kth primitive root of unit e 6 Hu. The qiiaternion 

-g-v — — ^ IK i\. 

W = ) = cos — -I- -t- J -I- /t-) sm — 

is the Kth primitive mot of unit 1 6 H. 

Proposition 4 Let 

/ 1 . 27T 27T 

V = w ^ sm — cos — 

K K 

then 4/(W) = \h(V) = w. 

Proposition 5 There exists such an element h ^ H that 

J —1 I 2/1 . . 27t 

h wh = cos — + 1 sm — =w. 

K A 

Proposition 6 Let X, V € Hu he such that 



(7) 



(S) 



1 . 27T 27T 

w ( sm — cos — 
R A 



(9) 



( 10 ) 



X = Ae-t-Tli-Cj+Dk+Aw-l-Fw' ^ GV - + Pw+Qw' -|- Rw^ .?w^', 

( 11 ) 

V = :rw+)/w. (12) 

Then, the mk-ul-ation. of pmd/uct XV rnguims only 16 real multiplications. 
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Proof. Using; Table 1, we g;et 



XV = [(i-’a; + A’y) e + (j4x + w — (i’x + Ay) w] (lil) 

+ [(-iia- + Hy) i + {-Hx - By) w“ + {Bx - By) 

+ [(-Su- + Fy)j + {Car - Sy) w’' + {-Fx - Cy) w''] 

+ [(-Qa- — Gy) k — (Dx - Qy) w-' + (-Ga; - Dy) w*] . 

Tlio el(Minrnts 0, ^ : 



C = fix — yy, y = ay + 7a;, $ = aa; + 



arc components of a three-point convolution 





Awrording to [5], [10], in orderr to calculate Cc'/jC real multiplications ax(! 
required. B 

By Rota(x) denote the mapping (an automorphism) 



Rota(x) = a ^xa, x, aCHu. 

Table 2 defines the Hota(x) mapping action for x, a t £. Columns of Table 1 
are Ihe i.)ermutatious of the element (11) components induced by the Rota(x) 
actions. 
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Remark 1 The group o/Rotn(x) mapping ar.tions on algebra Hu ifs exisy to 
interpret geometrically. Let, 



iH = {±Rota(«), aei'}, 
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then the. group is isomorphic to the group of self-coincidences of ’’quoiernion 
cube” 

={S: 6 = ±i ±j ± A-} c H. 

Proposition 7 Let the rows of matrix T be equal to the rows of Table 2. Then 
Rank T = 12. 

3 Fast algorithm for calculating three DFT- spec- 
tra with overlapping in the hurwitzion algebra 

Let ar("^(ni,n 2 ) be R’Gli— coiiipoueuts of a color image, a = 0, 1,2; iV = 2K = 
2'’. Let m 2 )} be lluee DFT-speclva: 

_v_i ,v_i 

TAi =:f> U2—^ 

We t:l;iim that tlit; following Thrx)rein 1 is true. 

Theorem 8 Then; is an algorithm, for calculating of three discrete two-dimensio- 
nal. Fourier sq)ect.m, which requires no more than 

M* [N'^) = N'^ log-2 N + 0{N'^) (14) 

real multiplications for each spectrum.. 

An analysis of the stmctnre of the described DFT fast algoritlnns allows ns 
to mark out six ste|)s of this algorithm. 

Step 1. Creation of on aiLTiliarij hurwitzion-valued .sequence. 

By xJ;;’(ni,n 2 ),XB)(m,n 2 ) denote 

('d ' "2) = 2:^, 37 (2n-i — 2n-2 + .1?) « = 0, 1, 2; ft , 7 = 0, 1; 

XfB(m,n2) = (- 4 „'e + x['^i + (n-i, 77.2), 

X^^H«i,n2) = («ac»'2), 

X^’b(/7.i, n2) = w + + XoVw-7 + x^iVw''j (/ri, n. 2 ), 

Consid(!r {in auxiliary hnrwitzion-walued sequence such that 

X(?7-i,n2) = X^^\ni, n.2) - X^^^(ni,n2) - X<^)(ni,n2). 

Step 2. Calculation of an awtiliarg discrete tmnsfor'm. (ADT) . 

Let V be denoted by (9). Consider an auxiliary hni-witzion-valned discrete 
tr{insform (ADT) siu:h thiit 

A’-l A'-l 

X(/,7,i,m2)= Y. 

ni=0 112 =0 



( 15 ) 
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Using Rad.ix-2 DPT decomposition s<-lieme [5j 
1 ff-l K-^ 

X(mi , m-2) = E EE X(2ni + 0,277,2 + 

0 ?ii_0n2— 0 

we get th(! (!stiinate of mnltipliciitivc! complexity 

= 12it‘^ log'2 K + 0(A'^). 



Step 3. Separation of the. ’’pnrt.ial ADT-.fper.tra”. 

It follows from Proposition 7 tlnit ”i)firtifil ADT-spectm” 

K-^ K-^ 

Xj? . rn-i) ’ n2) , o = 0, 1 , 2; /1, 7 = OU 

rti “0 vt2~0 

are iuii(|nely determined l>y the system of equations 



Rot„ ^X(mi . 777 , 2 )^ = Rotj, 



/'K-l K-1 

X( 77 ,i, 772 )V "^"^1 + "^”’^ 

0 Tf. 2—0 



hlg£. 



For cjUcuhition of x[j*i!^^:^(777i ; 777 . 2 ) only 0{N‘^) n;{il ;irithmetk:{jl operations im; 
rr.'qnired. 

Step Jf.. Separation of (he ’’partial quaternion spertni”. 

Under t he comlit ions of Proposit ion 4 t he ma|)ping 'I' transforiiLS X^“^ ( 7771 , 7772 ) 
into 

7 V '-1 7 C -1 

m 2 ) = E E « = 0’ 1^2; .^’7 = 0, 1 

V7.1 —0 7J.2— 

The niitpping requires no real multiplications. 

Stepo. Separation of the ’’partial cxnnplej; spertm”. 

Umler the conditions of Proposition 5 the mapping 

X.^‘’(w1,-7772) I 7 h ^X|“\777,i, 7772)/7 

transforms X^?/('77?.i, 777 - 2 ) hito 
ft'-l 7<'-l 

x^(mi , 777.2) = E E > 77.2)a.-"^'”^-"^'"E « = 0, 1, 2; /it , 7 = 0,1. 

m=0 ri2=0 

Only 0{N'^) reiU arithmeticiil operations <ire required to calculate (777 1,777.2) 



Step 6. Full DFT-spectm reconstruction. 




H urwiizion A Igebra 1 6 1 



The fill] DFT-spect.ra ean be reeonstrueted Ijy 

1 

;3,7-0 

This recoustnictiou requires 0{N'^) real aritlimelical operations. 

Filially, suiimiing the com]hexity estimates for Steps l-(t, we obtain (14). 

Til [9 , the similar task wa,s solved using quaternion DFT (QDFT): 

A'-1 

C)(mi , m 2 ) = ^ Q{n^ , n 2 )e^'^-' , (16) 

Tf \ * 0^2 - 0 



where 

Q{rt\ , 'o-i) = (ih’(«i , n-2) + jf?(«i , «2) + kB{m-i , m 2 )) . 

The transform (16) is introduced in [7] as an anxUiaiy transform, and is im- 
alyzed as a self-sufficient transform in [16] , [12]. The applications of QDFT 
to “anisotropic” tasks arc discussed in [11]. The ifGil— spectra calculation 
method, proposed in [9] requires 

M*(A^') = |A'"log2iV + 0(Af") 

real multijilications. This is almost three-times worse relative to the method 
coiLsidered above. 



Remark 2 Note, that there exist the 2D FFTs with the .slightly better eomputa.- 
tionn,l complexity cha.rn.cteri..st,ic.s. However the.se algorithms can hardly he used, 
by an ordinary user because of the complex structure. Despite its nen - triv- 
iality, the (dgorithm introduced above uses only standard cornputatiomd rneuns 
(R.a,dix-2 or mw-colurnn FFT, solution of the system of 12 linear equations etc.). 
The main distinction is in the non-emnplex Hu algehm arithmetic. 

4 Conclusion 

In the author's oiiiiiion, the capabilities of the apiiroacli described in this paper 
are not limited to (he applications considered above. 

In [14J the author used similar approach to develop the following DFT algo- 
rithms. 

• Fast algorithm for calculating three two-dimensional Fourier sjiectra with 
overlapping in the eight-dimensioiiiil group algelira A(R. D^) wuth 

= I Ar 2 log^ + 0( A'2), 

where D 4 is the eight-element dihedral group. 
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• Fast algorithms for <-alciilating five r-oinplex Foiirier spectra of complex 
data with overlapping in the six-dimensional group algebra A(R, S.^) with 

Ar(A0 = ^A'log,A + O(A0, 

where S3 is the six-element permutation group. 



It seems to l>e quite interesting to consider the 120-dimensional icosian alge- 
hra which allow to one calculate a. DFT with the 60-fold overlapjiing. The basis 
of this algebra is assoidated with the so-called icosians (integer quaternions over 
Q(^/5) {see[4j, |15])). 

i (±1,0,0, or, i(±l,±l,±l,=ir , i(0,±l,±rT,±rr, 

where by (x, y, z,t) we denote tin; cpiaternion x ± zy —jz + tk, 



1 


/Z"\ 


1 / fz\ 


II 

to 1 


- \^j 1 


r = -(l±v/5) 



(o: is the permutation ju'tion, o: £ A4 C S4.and A4 C S4 is the subgroup of 
even permutations). 

Certainly, significant emphasis should be laid on the development of multi- 
plication rules like those of Proposition 6 and the development of an appro|jriate 
mat hema tical formalism. 

One of the main goals of this article is to attract the reasearcliers' attention 
to the development of such a formalism. 
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Abstract. We present a novel extension of the Mumford-Shah func- 
tional that allows to incorporate statistical shape knowledge at the com- 
putational level of image segmentation. Our approach exhibits various fa- 
vorable properties: non-local convergence, robustness against noise, and 
the ability to take into consideration both shape evidence in given image 
data and knowledge about learned shapes. In particular, the latter prop- 
erty distinguishes our approach from previous work on contour-evolution 
based image segmentation. Experimental results confirm these proper- 
ties. 

Keywords: image segmentation, shape recognition, statistical learning, 
variational methods, diffusion, active contour models, diffusion-snake 



1 Introduction 

Robust shape recognition is a fundamental task in a perception-action cycle. 
Two fundamental issues in shape recognition are image segmentation and the 
representation and incorporation of previously learned shape knowledge in order 
to cope with missing and imperfect data. 

Variational methods provide a conceptually clear approach to image segmen- 
tation. They have been studied by Mumford and Shah [10] and others [1,11,12,9]. 
The basic idea is to approximate a given grey-value image with a piecewise 
smooth function by minimizing a suitable functional. There has been recent in- 
terest in knowledge acquisition through learning by examples, and appropriate 
modifications of the Mumford-Shah functional were proposed [14,13]. 

In this paper, we present a novel extension of the Mumford-Shah functional 
in order to combine powerful bottom-up diffusion filtering with top-down in- 
corporation of statistical knowledge about previously learned shapes. As a first 
step, shape knowledge is represented in terms of the principal components of 
samples in a linear vector space of contours. The principal component analysis 
(PCA) was shown to be quite effective in modeling shape variation [3,4, 2, 7]. Its 
combination with variational segmentation, however, is new. 
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Our paper is organized as follows: in Section 2 we extend the Mumford- 
Shah functional by an energy term representing statistical shape knowledge. The 
underlying curve representation and statistical learning process is explained in 
Section 3. Section 4 describes the curve evolution scheme governing the snake 
and its interaction with diffusion filtering. The key properties of our approach 
are demonstrated by numerical results in Section 5. 



2 Extending the Mumford— Shah Functional 

The well-known approach to image segmentation suggested by Mumford and 
Shah [10] is to compute a segmented version u{x) of an input image f{x) by 
minimizing the following energy functional: 

Ems{u, C) = i / (/ - u)2 cPx + i / jVup cPx + ,y\C\ , (1) 

^ Jo ^ Jo-c 

where Q C IR^ denotes the image plane. This functional is simultaneously min- 
imized with respect to the segmented image u and a contour C, across which u 
may be discontinuous. 

The three energy terms in (1) express the following constraints: The seg- 
mented image u should approximate the input image /; it should be smooth 
but jumps are allowed at the contour C, and the contour length jCj should be 
minimal. A and v>Q are weighting factors. 

In this paper, we propose to extend this energy functional to 

E{u,C) = Ems{u,C) + aE,{C), (2) 

where the contour energy term EdC) accounts for previously learned shape in- 
formation. Minimization with respect to C should then favor contours which the 
segmentation process is “ familiar” with due to the acquired shape knowledge. 

How we describe shape, how we acquire shape statistics, and how we incor- 
porate these in the contour energy E^, will be explained next. 



3 Modeling Shape Knowledge 

3.1 Shape Representation 

In this paper we represent the shape of an object as a quadratic B-spline curve. 
We chose this representation for the following reasons: a quadratic spline curve 
gives a simple mathematical description of an object contour. Moreover, it is 
differentiable such that curvature and normal vectors can easily be determined. 
The number of control points can be adapted to obtain a detailed contour de- 
scription in terms of a moderate amount of data per shape. In particular, the 
shape statistics of various objects can conveniently be learned by examining the 
distribution of control points in a finite-dimensional vector space. 
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Thus, in this paper a contour will be a closed planar curve of the form 

^ / \ 

C'(s) = ^r” Bn(s), (3) 

— : \yn J 

n—1 ^ 

where {xn,yn) are the control points and Bn{s) are quadratic periodic B-spline 
basis functions [5,2]. 

Given the shape of an object as a binary image, the spline curve is auto- 
matically fitted around the object contours. As a result, the objects shape is 
represented by the set of control points {{xn,yn)}n=i..N for which we use the 
compact notation: 

z = {xi,yi,...,XN,yNY (4) 

Each shape therefore corresponds to a vector 2 ; € 

3.2 Shape Statistics 

We now assume that numerous examples of the appearance of an object are 
given by a set of shapes X = {zi, Z 2 , ■ • ■} with Zi € for all i. To analyze 
the variation within this set of shapes, we perform a principal component analysis 
(PCA) of the points Zi (cf. [3,4]). That is, we determine the mean Zq = E[zi], 
and the covariance matrix 

S = E [{zi - zo) {zi - ZoY] , (5) 

where E [•] denotes the sample average over the shapes Zi in our learning set X . 
This is equivalent to modeling the distribution of shapes in as a Gaussian 
distribution. The eigenvectors associated with the largest eigenvalues af of the 
covariance matrix E are the (significant) principal components and indicate the 
directions of largest shape variation. The square root ai of the eigenvalue is a 
measure for the amount of shape variation in a given direction. 

The shape variation can be visualized by sampling along an eigenvector in 
both directions from the mean. As an example, we present in Figure 1 an arbi- 
trary set of six different ellipses (shown on the left) and a visualization of the first 
two principal components (middle and right images). In this example, the visu- 
alization allows for a very intuitive interpretation of the principal components: 
the first one describes a variation in the size and the second one a variation in 
the aspect ratio of the ellipses. Thus, the principal component analysis imposes 
an order on the learned shapes, a ranking by importance, and simultaneously 
induces a compact description of shape variation. Moreover, having learned a 
number of shapes in this way, the covariance matrix allows us to define a prob- 
ability measure on the shape space. If the covariance matrix E is of full rank, 
then the inverse E~^ exists and the Gaussian probability distribution of shapes 
z is 

- zoY ■ 



V (z) oc exp 



( 6 ) 
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Fig. 1. On the left are shown six arbitrary ellipses. The middle and right images 
show the first and second principal component: sampling was done around the 
mean contour by two standard deviations in both directions 



In general, however, the covariance matrix S may not have full rank, such that 
an inverse is not defined. Therefore, we replace it by a modified pseudoinverse 
S* , which we construct as follows: assume the square roots of the eigenvalues 
of E to be ordered such that cti > CT 2 > • ■ • > <7^, where r is the rank of S. We 
define the modified pseudoinverse as 



r* := T 




V 



\ 

T* + (/ - TT*) , 



/ 



( 7 ) 



where T is the 2Ny, i — matrix, the columns of which are the first r eigenvectors 
of E. The constant ctj_ was introduced to account for all directions orthogonal 
to the eigendirections. It should be of the order of 

The geometric meaning of this modified pseudoinverse is the following: if 
the rank of E is less than the dimension 2N of the parameter space, then the 
assumed Gaussian distribution does not extend into the space orthogonal to the 
eigendirections. This means that shapes lying outside the eigenspace are not 
admissible. By adding the second term, the Gaussian distribution is artificially 
expanded into the orthogonal space. This allows to also take into consideration 
evidence of given image data that does not correspond to the learned shape 
knowledge. Enforcing a± < would, for example, mean that the probability 
of a shape variation outside the learned shape space is not necessarily zero, but 
smaller than or equal to any shape variation encountered within the set of learned 
shapes. Decreasing crj_ will continually suppress all unknown shape variations. 



3.3 Shape Energy Functional 

From the previously constructed shape distribution (cf. equation (6)) and the 
Gibbs identity 



V(z) oc exp(-E(z)) , 



( 8 ) 
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Fig. 2. Energy plots. The left side shows a schematic plot of the contour en- 
ergy (9): learned directions (eigenvectors of the covariance matrix S) are repre- 
sented by X and orthogonal directions by y. The right image shows the image 
energy EmS: the contour energy and the sum of both for a fixed input image 
as a function of different contours. For the parameter values —1,0,1, the three 
respective contours are shown below 

we derive the following shape energy functional: 

Ec (z) = ^{z - ZqY E* {z - Zo) . (9) 

To visualize the meaning of the orthogonal correction a± in the definition 
of E*, Figure 2, left side, shows a schematic plot of the shape energy. For the 
purpose of clarity the shape space is reduced to two dimensions — a learned 
direction x and an orthogonal direction y. The increase of shape energy in the 
learned directions is smaller than the one in orthogonal directions, such that 
the entire space of shape variations is basically reduced to an elastic tunnel of 
familiar shapes. Restricting the shape variability to this tunnel amounts to a 
drastic reduction in the effective dimensionality of the search space. 

Adding the contour energy (9) to the Mumford-Shah functional (1) we obtain 
the total energy 

F(u, C{z)) = Ems{u, C{z)) + a - z^Y E* {z - Zq) , (10) 

which is now a function of both the image u and the contour control points z. 
Increasing the parameter a > 0 allows to continuously shift from an image-based 
segmentation to a knowledge-based segmentation. On the other hand, the limit 
a — *■ 0 results in a pure Mumford-Shah segmentation for closed spline contours. 
The effect of the contour energy upon the total energy can be seen in Figure 2, 
right side: image energy Ems, contour energy Ec and total energy are plotted 
for various contours. The middle contour is the one which optimally describes 
the input image. 

Several favorable properties are obtained by adding the contour energy: the 
basin of attraction is greatly increased. For sufficiently large learning strength 
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a, the total energy will be convex. Moreover, the position of the minimum 
can be shifted: this leads to a minimizing contour which displays a compro- 
mise between knowledge-based and data-driven segmentation. This effect will 
be demonstrated by examples in Section 5. 

4 Implementation 

The total energy in (10) needs to be minimized with respect to both the seg- 
mented image u and the spline control points z defining the shape of the seg- 
menting contour in the image plane. This is done by iterating two fractional 
steps: the first minimizes (10) with respect to the contour C (Subsection 4.1), 
the second minimizes (10) with respect to the image u (Subsection 4.2). 



4.1 Curve Evolution 



As shown in [10], minimizing Ems(u, C) with respect to the contour C leads to 
the equation 

— = e^(s) — e“(s) — k(s) = 0 Vs. (11) 

Here, s is the curve parameter, k is the curvature of the contour, e~^ and e~ refer 
to the integrand in Ems(u,C) on the outside and the inside of the contour C, 
respectively. Minimization can be implemented by a curve evolution equation 
with an artificial time parameter t: 

= (e+{s,t) - - K{s,t)) n{s,t) Vs, (12) 

such that the contour is modified along its outer normal vector n(s,t) only. 

Inserting the contour definition (3) results in an evolution equation for the 
control points z. Without loss of generality, this will be described for the x- 
components only: 

Bmjs) = (e^(a,t) - e~{s,t) - K{s,t)) Vs, (13) 

m—1 

where Bm{s) are the spline basis functions and rix denotes the x-component of 
the normal vector. 

This equation has to be satisfied for every point along the curve. The descrip- 
tion of the contour as a spline curve induces a discretization of (13) along the 
parameter s with the nodes s^ = (i -I- 1/2) /N, i = 1, . . . , N, which are the max- 
ima of the respective basis functions Bi{s). We end up with a set of N coupled 
linear differential equations: 



N 

^ ^ Bjni 

m—1 



dXm{t) 



dt 



(e+(si,t)-e {si,t) - n{si,t)) n^{si,t) . (14) 
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The coefficients Bmi = form an NxN tridiagonal matrix B. Inversion 

leads to the evolution equation for the control points. Adding the term induced 
by the statistical shape information (9) 

-^ = S*{z-zo), (15) 

we obtain the evolution equation of the control points for the total energy (10): 

(e+(si,t) - e~{s^,t) - K{si,t)) n^{s^,t) 

- [E* {z - zo)],^_, . (16) 

In the corresponding equation for the y-components, nx has to be replaced by 
Uy and [A* {z - by [A* {z - 2o)]2,r^■ 

4.2 Inhomogeneous Linear Diffusion 

To minimize the total energy E(u, C) in (10) with respect to the image u, we 
rewrite the Mumford-Shah functional: 

Ems{u,C) = ^ / {f-ufSx + / Wc{x)\Eu\^ Sx + v\C\. (17) 

z Jn z Jq 

The contour dependence is now implicitly represented by an indicator function 

Wc ■ ^ {0, 1}, u>c{x) = -f ? ^ ^ • (18) 

For fixed C, the corresponding inhomogeneous linear diffusion equation 

^ = -^ = (/ - w) - V • {weix) Vu) (19) 

converges to the minimum of (17) for t —f oo. 

We used a modified explicit scheme to solve (19), and iterated the contour 
evolution equation (16) alternatingly. 

5 Results 

Implementation Details. For the set of six ellipses, shown in Figure 1 on 
the left, mean contour and covariance matrix were determined. The modified 
pseudoinverse E* (equation (7)) was determined, the parameter for orthogonal 
correction was chosen a± = 1.2 ar- The contour was initialized to the mean con- 
tour and the following steps were iterated: calculate the indicator function w(x) 
(equation (18)) from the current contour C, diffuse the image u with the inho- 
mogeneous diffusivity w{x) (equation (19)), update the contour control points 
(equation (16)) and determine the new contour C (equation (3)). This procedure 
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Fig. 3. The input image is shown on the left. The information energy was con- 
structed upon the 6 ellipses of Figure 1 for the top row and a set of bars for the 
bottom row. For three strength values a = 2000,22000,840000 (top row) and 
OL = 300,2000,50000 (bottom row) the final contour was determined. Increas- 
ing the knowledge energy the overlapping part (bar or ellipse) is continuously 
removed 

was done for various values of the parameter a, which determines the relative 
strength of the knowledge term in the energy functional (equation (10)). 

Image based vs. knowledge^based segmentation. The input image - a 
rectangle overlapping an ellipse - and the resulting contours C for the three 
different values of a are shown in Figure 3, top row. To elaborate the effect of 
the acquired shape statistics, we analyzed the same input image - Figure 3 left 
- but this time we constructed the contour energy based on a different set of 
learned images, namely a set of bars. 

The results show that for increasing knowledge strength parameter a, the seg- 
mentation process continuously ignores shapes that do not correspond to learned 
shape statistics. Note, however, that the resulting contour does not correspond 
simply to the mean shape, i.e. the most probable shape. The segmentation pro- 
cess still incorporates evidence given by the input image data. 

Similar results are obtained (for fixed a>0) by continuously decreasing the 
parameter cr_L which suppresses all contour motion in directions orthogonal to 
the learned ones. 

Non-local convergence and noise robustness. In a second example, we 
show the robustness of the method against noise and its independence of the 
initial contour. We trained the system with the set of ellipses. The input image 
is an ellipse (which, again, is not the mean contour of the training set) with 75% 
noise, that is three out of four pixels were replaced by an arbitrary grey value. 
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Fig. 4. Robustness: The contour energy is constructed upon the set of ellipses 
on the left. The input image (middle) is an ellipse with uniformly distributed 
noise where 75% of the pixels were replaced by arbitrary grey values. The right 
side shows the final contour. The initialization and iteration process is depicted 
in Figure 5 



Figure 4 shows that the correct segmentation is obtained despite the large 
amount of noise. Figure 5 depicts three samples of the dynamic segmentation 
process: the diffused images u and the associated contours are plotted at various 
times. Note, that the initial shape (left) is “far away” from the correct shape 
(right). Nevertheless, the process converges. This behavior corresponds to the 
shape of the energy functional depicted in Figure 2, right: shape knowledge 
increases the basin of attraction considerably. 




Fig. 5. From left to right: a series of steps in the minimization process showing 
the diffused image and the current contour (in black) for the noisy input image in 
Figure 4. The parameters were chose = 400, v = 800, a = 100000, crj_ = 0.4 cr^.. 
Note, that the final contour (bottom right) does not correspond to the mean 
shape of the learning set 




Diffusion-Snakes Using Statistical Shape Knowledge 



173 



6 Conclusion 

We presented a novel extension of the Mumford-Shah functional which allows to 
integrate statistical shape knowledge already at the computational level of image 
segmentation. As a result, we obtained a snake-like segmentation approach which 
exhibits the following favorable properties over classical snake approaches [2,6,8]: 

(i) Non-local convergence and noise robustness: these properties are due to the 
diffusion filtering, which is a global smoothing process interacting over large 
distances within the image plane. By contrast, the region of convergence of 
traditional snake approaches is restricted to narrow valleys typically defined 
by locally estimated gradients of noisy image data. 

(ii) The forces driving the snake do not solely depend on signal transitions of 
given image data, but also take into consideration previously experienced 
appearances of known objects. 

Our method allows to continuously shift from an image-based to a knowledge- 
based segmentation. First experiments indicate its high robustness against noise. 

Our further work will focus on more sophisticated representations of statisti- 
cal shape information going beyond standard PCA, and on deriving meaningful 
criteria to determine the “knowledge strength” parameter a automatically. 
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Abstract. In signal processing, the approach of the analytic signal is a 
capable and often used method. For signals of finite length, quadrature 
filters yield a bandpass filtered approximation of the analytic signal. In 
the case of multidimensional signals, the quadrature filters can only be 
applied with respect to a preference direction. Therefore, the orienta- 
tion has to be sampled, steered or orientation adaptive filters have to 
be used. Up to now, there has been no linear approach to obtain an 
isotropic analytic signal which means that the amplitude is independent 
of the local orientation. In this paper, we present such an approach using 
the framework of geometric algebra. Our result is closely related to the 
Riesz transform and the structure tensor. It is seamless embedded in the 
framework of Clifford analysis. In a suitable coordinate system, the filter 
response contains information about local amplitude, local phase and 
local orientation of intrinsically one-dimensional signals. We have tested 
our filters on two- and three-dimensional signals. 



Keywords: quadrature filter, analytic signal, Riesz transform 



1 Introduction 

In image and image sequence processing, different paradigms of interpreting the 
signals exist. Regardless of they are following a constructive or an appearance 
based strategy, they all need a capable low-level preprocessing scheme. 

For one-dimensional signals, the analytic signal and the quadrature filters are 
capable theoretical and practical methods, respectively. The analytic signal codes 
the local properties of structure in an optimal way. Using quadrature filters, it 
is simple to detect steps and spikes in the signal. 

Accordingly, in image processing, the detection of edges and lines is a fre- 
quently discussed topic, which suffers from the fact that there has been no odd 
filter with isotropic energy up to now (e.g. [12]). The corresponding problem in 

* This work has been supported by German National Merit Foundation and by DFG 
Graduiertenkolleg No. 357 (M. Felsberg) and by DFG So-320-2-2 (G. Sommer). 
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the frequency domain is that one cannot define positive and negative frequencies 
(see [7]), such that it is not possible to create a ’real’ 2D quadrature filter. 

To overcome this problem, several approaches has been developed in the past, 
all using the quadrature filters with respect to a preference direction: 

1. orientation adaptive filtering using the structure tensor, e.g. [7,8] 

2. sampling of the orientation, e.g. [12,13] 

3. steerable filters, e.g. [5,16] 

The first two approaches are non-linear and the corresponding algorithms have 
high complexities (compared to convolutions). The steerable filters are linear and 
fast, but they are not related to a generalized analytic signal and only yield ap- 
proximative quadrature filters with steered preference direction. In our opinion, 
the structure tensor is the method which is closest to a generalized quadrature 
filter. It is isotropic but not linear because the phase is neglected. Actually, the 
phase contains all information about the characteristic of structure [18]! 

Therefore, one should keep the phase, which is automatically fulfilled if a 
linear approach is developed. Since the preprocessing is only the first link in 
a long chain of operations, it is also useful to have a linear approach, because 
otherwise it would be nearly impossible to design the higher-level processing 
steps. If the preprocessing is linear, one can consider simple cases because the 
effect in a more complex signal is simply the sum of the parts. 

On the other hand, we need a rich representation if we want to treat as much 
as possible in the preprocessing stage. Furthermore, the representation of the 
signal during the different operations should be complete, in order to prevent 
a loss of information. These constraints enforce us to leave the approach of 
complex analysis and to use the framework of geometric algebra instead which 
is also advantageous if we combine image processing with neural computing and 
robotics (see [20]). 

In this paper, we introduce a new approach for the 2D analytic signal which 
enables us to substitute the structure tensor by an entity which is linear, pre- 
serves the split of the identity and has a geometrically meaningful representation. 
We have overcome the problem of odd filters in higher dimensions, the resulting 
method is of low complexity and is naturally embedded as a generalized analytic 
signal in the field of Clifford analysis. 

2 Fundamentals 

Since we work on signals in Euclidean space (R"), we have to use the geometric 
algebra Ro,n- That is, for ID signals we use Ro,i (isomorphic to the algebra of 
complex numbers), for image processing we use Rq ,2 (isomorphic to the algebra 
of quaternions H), and for image sequences we use Rq, 3 - The classical complex 
signal theory naturally embeds in these algebras, since the algebra of complex 
numbers can be considered as a subalgebra. 

The base vectors of R" are denoted 

ei,e 2 , ... ,e„ where e^efc = -1, /c G {1, . . . ,n} 
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and the base elements of Mq,™ are denoted 



eo,ei,e2 ,... , e„, ei 2 , 613 , . . . , 0123 , . . . 



where eg is the base element for the scalar part, i.e. it commutes with all base 
elements and squares to 1. The subspace of Rp.n consisting of /c- vectors is denoted 
Kg The conjugation (inversion and reversion) of a € Kq,™ is denoted by a. For 
a complete introduction to geometric algebra see e.g. [10]. 

All signals are considered to be defined on vector spaces, hence a real ID 
signal is not any longer a function R — > R but a function eiR — > epR or a curve 
in Rg30Rg3. Accordingly, an image is a surface in Rq 2 ©Rq 2 (i-®- in n 3D 
space), and an image sequence is a 3D subspace in Rg 3 ©Rq 3. Vectors (elements 
of Rq „ = R") are denoted in bold face: x = Xk^k- Elements of Rg „©Rg „ 

{paravectors, [19]) are denoted in normal face: x = + x. 

The Fourier transform of the nD signal f{x) is denoted 



Since we want to extend the analytic signal, we briefly introduce the ID 
approach (see e.g. [11]). The Hilbert transform of a ID signal / is denoted fn and 
it is obtained by the transfer function H (u) = ei sign(w) in the frequency domain 
or by the convolution kernel The analytic signal is obtained by 

the sum /^ = / - fnei. 

The typical property of the analytic signal is the split of identity which 
means that it contains local phase and local amplitude. While the local phase 
represents a qualitative measure of a structure, the local amplitude represents 
a quantitative measure of a structure (see e.g. [18,12]). For higher dimensions, 
a consequent generalization of the analytic signal should keep the property of 
splitting the identity. 

Local structures in multi-dimensional signals can be classified in different 
categories according to the intrinsic dimensionality (see [14]). In our approach, 
we keep with a single structure phase. Therefore, we can only classify intrinsically 
ID signals. What is left for a complete description of structure is the local 
orientation. Obviously, local orientation is not independent of the phase because 
the local direction of a signal Axes both properties at the same time (see [6]). 
The resulting constraint is fulfilled by our approach which we will introduce in 
the next section. 

3 The Monogenic Signal 

The constraint which must be fulfilled by the multidimensional extension of the 
analytic signal is the following: If the signal is rotated such that it is reflected 
wrt. the origin^ (e.g. 2D: rotation of tt), the change of the orientation phase must 

^ Note that for most dimensions, it is not possible to find a rotation which is identical 
to a reflection through the origin for arbitrary objects. In our case, we only consider 
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yield a negation of the structure phase. This is fulfilled if we use the standard 
spherical coordinates and assign the first angle to the structure phase: 

xo = r cos 01 
xi = r sin 0i cos 02 



Xn-i = r sin 0i . . . sin 0n-i cos 0n 
Xn = r sin 0i . . . sin 0„_i sin 0„ 

where r = i/xE, 02 , • ■ • ,0n & [0; tt) and 0i G [0; 2tt). 

For the 2D case this coordinate system is illustrated in Fig. 1. The reflection of 




Fig. 1. Spherical coordinate system for 2D 



a point wrt. the origin in vector space corresponds to a rotation of the angular 
coordinate 02 by tt. This yields a negation of x (identical to the conjugation of 
x) and therefore, the structure phase is negated as well. In other words, we have 
to find an operator O such that 

0{f{x)} € Ko.n®IRo.« and (1) 

0{f{-x)} = 0{f{x)} (2) 

which is a multidimensional generalization of the Hermite symmetry of the an- 
alytic signal. Consequently, the eg-part (real part) of 0{f(x)} must be even 
and the vector part of 0{f{x)} must be odd. Up to now, it was a common 
opinion that no odd operator with isotropic energy exists (e.g. [12]). Actually, 
this statement is true if the operator is required to consist of only one scalar 



intrinsically ID signals which means that it is sufficient to reflect wrt. the direction in 
which the signal changes (normal vector). This is always done by a rotation around 
an axis orthogonal to the normal vector through the origin. 
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valued component. If we extend our operator space to vector valued operators, 
the statement is not true any more. 

The operator O that fulfills the constraints (1) and (2) would be the mul- 
tidimensional generalization of the analytic signal we are looking for. The even 
part of O can be adopted from the ID analytic signal, since the Dirac impulse 
5q{x) is even and isotropic. The harder task is to find the odd part of O. If we 
have an intrinsically ID signal with normal vector n (i.e. f{x) = g{{x ■ n)ei)), 
a good choice for the odd part of O would transform f{x) such that 

Oodd{f{x)} = ±ngH{{x ■ n)ei) (3) 

where gn is the ID Hilbert transform of g. The factor n yields the odd symmetry 
we need. 

In order to obtain Oodd we have to look at the Fourier transform of f{x): 
f{x) o-u F{u) = So(u A n)G((u ■ n)ei) , 

(So(u A n) = So{—{u A n) ■ 612) . . . So{—{u A n) ■ and therefore, 

Oodd{f{x)} o-» ± nSo{u /\ n) s\gn{—u ■ n)G{{u ■ n)ei)ei . 

Since n is equal to we obtain 



u 

Oodd{f{x)} ± —Sq{u An)G{{u ■ n)ei)ei . 

\u\ 

The transfer function of the generalized Hilbert transform reads 

mu) = ^ , ( 4 ) 

{Oodd{f{x)} 0 -* — H{u)F{u)ei) where we chose the sign according to the ID 
case. The transfer function of the Hilbert transform reads = sign(u,)ei. 

The spatial representation of (4) which can be obtained using the Hankel 
transform (see [1,4]) 



h{x) 



F{{n +l)/2) X 

T^{n+l)/2 |a;|n+l®l ’ 



( 5 ) 



is the kernel of the Riesz transform which is the multidimensional generalization 
of the Hilbert transform from a mathematician’s point of view (see e.g. [21] and 
also [17]). The factor in (5) is two times one over the surface area of the n- 
dimensional unit-sphere {F: Gamma function). Actually, the kernel of the Riesz 
transform is closely related to the Cauchy kernel 

jp( N _ l)/2) X _ F{{n +l)/2) xoeo-x 

27t("+i)/2 \x\r^+i ~ 27t("+i)/2 [xoBo - k a;|"+i 



( 6 ) 
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(we have E{x)\xq=o = —^h(x)ei) which is the fundamental solution of the gen- 
eralized Cauchy-Riemann differential equations: 




(7) 



If a (Clifford valued) function f{x) solves this system of differential equations, 
it is called a (left) monogenic function. Therefore, the monogenic function is the 
multidimensional generalization of the analytic function. 

The analytic signal got its name from the analytic function because the 
Hilbert transform is identical to a convolution with the ID Cauchy kernel 
for a:o = 0 (up to a factor of two) and therefore, the Hilbert transform of an 
analytic signal reads 



Ja{x) = 

7T 



fA{t) 

X — t 



dt 



( 8 ) 



which is quite similar to the Cauchy formula for analytic functions (for details 
see [9]). 

Since the Riesz transform is a convolution with the nD Cauchy kernel for 
xo = 0 (up to a factor of two) and the Riesz transform of the signal 



/ m (®) = f{x) - fH{x)ei 



(9) 



fulfills 



fM{x) 



n{n+l)/2) 

7r(n+l)/2 



{x - t)fM{t) 

\x — 



dt 



( 10 ) 



which can be obtained from the generalized Cauchy formula for monogenic func- 
tions, the signal /m(®) is called the monogenic signal. 

We conclude this section with a last remark. The monogenic signal can be 
obtained in three ways: 

1. by the transfer function 1 — 

2. by the convolution kernel (5 q + and 

3. by a modified inverse Fourier transform (see [4] for the 2D case) 

Finally, the proofs of the relations from Clifford analysis can be found in [2] 
and some proofs of the relations of the monogenic signal can be found in [4] . 



4 Spherical Quadrature Filters 

In practical cases of signal processing, signals are of finite length. Therefore, the 
Hilbert transform is calculated for a bandpass filtered version of the signal. The 
Hilbert transform of the bandpass filter and the bandpass filter itself form a pair 
of quadrature filters. This approach can also be applied to the Riesz transform 



The Multidimensional Isotropic Generalization 



181 



in order to obtain the multidimensional generalization of quadrature filters: the 
spherical quadrature filters (SQF). 

The SQF are an (n+l)-tuple of filters which are created by a radial bandpass 
filter and the convolution of the Riesz kernel (n components) with this bandpass 
filter. The energy of the filter is isotropic (if the effect of the cubic filter mask 
can be neglected, see e.g. [3,15]) and it estimates the local amplitude, local phase 
and local orientation with only n + 1 convolutions. Hence, it is quite fast and 
should be real time capable. 

The SQF are somehow related to the steerable quadrature filters (e.g. [5]), 
since the vector part is steerable. But there are some differences: firstly, the 
steered quadrature pair is an approximation to a classical quadrature filter with 
arbitrary preference orientation. 

Secondly, the orientation parameter is directly obtained from the vector part 
of the SQF in contrast to the steered quadrature filters. The reason for this is 
the degree of the polynomials which has to be at least two in order to obtain 
a sufficient approximation of the Hilbert transform. The SQF correspond to 
polynomials of degree zero and one, such that the filter responses are constant 
and linear with the orientation vector. 

Thirdly, the radial bandpass is derived from a Gaussian function in [5], 
whereas for the spherical quadrature filters any bandpass can be chosen. In 
our approach, we use the lognormal bandpass because it has some fundamental 
advantages wrt. the Gaussian function (see also [13]): it allows arbitrary large 
bandwidth while always being DG-free. 

Therefore, our bandpass filter is represented in the frequency domain by 



B{u) = exp 



/ (log(|n]/2^))^ \ 

V 2(log(c))2 J 



( 11 ) 



where k is a, constant indicating the center frequency and c is a constant indicat- 
ing the bandwidth of the bandpass (e.g. c = 0.55 corresponds to two octaves). 
The transfer functions and impulse responses of some filters are illustrated in 
Fig. 2. 

We applied the 2D SQF to some natural and synthetic images and the 3D 
filters to a synthetic image sequence^. The results of the 2D experiments (syn- 
thetic images) can be found in Fig. 3. The amplitude of the filter responses 
of the Siemens star and the modulated ring show the isotropy of the spherical 
quadrature filters. The local orientation and the local phase are represented as 
grey values which vary linearly in angular and radial direction, respectively. 

The experiments with real images^ are shown in Fig. 4. The local amplitude 
indicates where to find edges. The absolute value depends on both, the marked- 
ness of the structure and the local contrast. The images of the local orientation 
and the local phase are masked by a threshold of the local amplitude. In the 

^ The 3D results can be found as mpeg-movies at the homepage of the first author 
(URL: http://www.ks. informatik.uni-kiel.de/'mfe). The normal vector of the plane 
is estimated with an error of less than 0.1°. 

^ URL: http://www-syntim.inria.fr/syntim/analyse/images-eng.html 
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Fig. 2. Spherical quadrature filters (lognormal radial bandpass). Upper row: 
even filters, bottom row: odd filters. Left column: frequency domain, middle and 
right column: spatial domain with different c/fc-ratio 




Fig. 3. Experiments with synthetic 2D data. Siemens star (left column, top 
and bottom), signal and amplitude of filter response, modulated ring (middle 
column, top) and filter response (amplitude: bottom middle, local orientation: 
upper right, and local phase: bottom right) 
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Fig. 4. Experiments with real 2D data. From left to right: original image, local 
amplitude, local orientation and local phase, images from INRIA-Syntim © 



areas of very low local amplitude the phase and orientation is irrelevant due 
to noise. Note that the phase and the orientation are cyclic. Therefore, white 
indicates nearly the same angle as black. 

5 Conclusion 

In our opinion, the monogenic signal is the consequent multidimensional general- 
ization of the analytic signal. It is seamless embedded into the theory of Clifford 
analysis. The Riesz transform is an elegant way to overcome the problem of odd 
isotropic multi-dimensional filters and therefore, it is the best way to generalize 
the Hilbert transform, not only from a mathematician’s point of view but also 
from the perspective of a signal-theorist. 

The spherical quadrature filters are more capable than the classical quadra- 
ture filters for higher dimensions. Besides the complete representation of local 
structure, they are linear and of lower complexity than classical approaches (e.g. 
structure tensor). Due to linearity, they form a good basis for the design of linear 
and non-linear second level filters. 

The algebraic representation of the filter response is geometrically insightful. 
The interpretation of the data is directly given by the involved geometry, all cal- 
culations can be vividly designed. This proves once more the power of geometric 
algebra. 
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Abstract. This article describes an essential step towards what is called 
a view centered representation of the low-level structure in an image. In- 
stead of representing low-level structure (lines and edges) in one compact 
feature map, we will separate structural information into several feature 
maps, each signifying features at a characteristic phase, in a specific scale. 
By characteristic phase we mean the phases 0, tt, and ±7r/2, correspond- 
ing to bright, and dark lines, and edges between different intensity levels, 
or colours. A lateral inhibition mechanism selects the strongest feature 
within each local region of scale represented. The scale representation is 
limited to maps one octave apart, but can be interpolated to provide a 
continous representation. The resultant image representation is sparse, 
and thus well suited for further processing, such as pattern detection. 

Keywords: sparse coding, image representation, view centered repre- 
sentation, edge detection, scale hierarchy, characteristic phase 



1 Introduction 

From neuroscience we know that biological vision systems interpret visual stimuli 
by separation of image features into several retinotopic maps [5]. These maps 
encode highly specific information such as colour, structure (lines and edges), 
motion, and several high-level features not yet fully understood. This feature 
separation is in sharp contrast to what many machine vision applications do 
when they synthesize image features into objects. We have earlier discussed 
these two approaches, which are called view centered, and object centered image 
representations [8]. This report describes an attempt to move one step further 
towards a view centered representation of low level properties. 

As we move upwards in the interpretation hierarchy in biological vision sys- 
tems, the cells within each feature map tend to be increasingly selective, and 
consequently the high level maps tend to employ more sparse representations [3]. 
There are several good reasons why biological systems employ sparse represen- 
tations, many of which could also apply to machine vision systems. 

Sparse coding tends to minimize the activity within an over-complete feature 
set, whilst maintaining the information conveyed by the features. This leads to 
representations in which pattern recognition, template storage, and matching are 
made easier [3]. Gompared to compact representations, sparse features convey 
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more information when they are active, and contrary to how it might appear, 
the amount of computation will not be increased significantly, since only the 
active features need to be considered. 

Most feature generation procedures employ filtering in some form. The out- 
puts from these filters tell quantitatively more about the filters used than the 
structures they were meant to detect. We can get rid of this excessive load of 
data, by allowing only certain phases of output from the filters to propagate 
further. These characteristic phases have the property that they give invariant 
structural information rather than all the phase components of a filter response. 

The feature maps we generate describe image structure in a specific scale, 
and at a specific phase. The distance between the different scales is one octave 
(i.e. each map has half the center frequency of the previous one.) The phases we 
detect are those near the characteristic phases^ 0, tt, and ±7t/2. Thus, for each 
scale, we will have three resultant feature maps (see figure 1). 



Image scale pyramid 









0 phase 



K phase 





±2 phase 



Fig. 1. Scale hierarchies 



This approach touches the field of scale-space analysis pioneered by 
Witkin [1]. See [2] for a recent overview. Our approach to scale space analy- 
sis is somewhat similar to that of Reisfield [4]. Reisfield has defined what he 
calls a Constrained Phase Congruency Transform (CPCT), that maps a pixel 
position and an orientation to an energy value, a scale, and a symmetry phase (0, 
7T, ±7t/2, or none). We will instead map each image position, at a given scale, to 
three complex numbers, one for each of the characteristic phases. The argument 
of the complex numbers indicates the dominant orientation of the local image 
region at the given scale, and the magnitude indicates the local signal energy 
when the phase is near the desired one. As we move away from the characteristic 
phase, the magnitude will go to zero. This representation will result in a number 
of complex valued images that are quite sparse, and thus suitable for pattern 
detection. 

^ We will define the concept of characteristic phase in a following section. 
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2 Phase from Line and Edge Filters 

For signals containing multiple frequencies, the phase is ambiguous, but we can 
always define the local phase of a signal, as the phase of the signal in a narrow 
frequency range. 

The local phase can be computed from the ratio between a band-pass filter 
(even, denoted fe) and it’s quadrature complement (odd, denoted /o). These 
two filters are usually combined into a complex valued quadrature filter, f = 
fe + ifo [6] . The real and imaginary parts of a quadrature filter correspond to 
line, and edge detecting filters respectively. The local phase can now be computed 
as the argument of the filter response, q{x), or if we use the two real- valued filters 
separately, as the four quadrant inverse tangent; a,rcta,n(qo{x),qe{x)). 

To construct the quadrature pair, we start with a discretized lognormal filter 
function, defined in the frequency domain. 



r ln^{p/pi) 

Ri{p) ~ if p > 0 (1) 

I 0 otherwise 

The parameter pi determines the peak of the lognorm function, and is called 
the center frequency of the filter. We now construct the even and odd filters 
as the real and imaginary parts of an inverse discrete Fourier transform of this 
filter.^ 



/e.,(x) =Re(IDFT{i?,(p)}) (2) 

/o,*(x) =Im(IDFT{i?,(p)}) (3) 

We write a filtering of a sampled signal, s(x), with a discrete filter fk{x) 
as qk{x) = (s * fk){x), giving the response signal the same indices as the filter 
that produced it. 

3 Characteristic Phase 

By characteristic phase we mean phases that are consistent over a range of scales, 
and thus characterize the local image region. In practise this only happens at 
local magnitude peaks of the responses from the even, and odd filters.^ In other 
words, the characteristic phases are always one of 0, tt, and ±7t/2. 

Only some occurrences of these phases are consistent over scale though (see 
figure 2). First, we can note that band-pass filtering always causes ringings in 

^ Note that there are other ways to obtain spatial filters from frequency descriptions 
that, in many ways produce better filters [7]. 

^ A peak in the even response will always correspond to a zero crossing in the odd 
response, and vice versa, due to the quadrature constraint. 
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Fig. 2. Line and edge filter responses in ID 
Top: A one-dimensional signal. 

Center: Line responses at pi = tt/2 (solid), and tt/4 and tt/8 (dashed) 
Bottom: Edge responses at pi = tt/2 (solid), and tt/4 and 7t/8 (dashed) 



the response. For isolated line and edge events this will mean one extra mag- 
nitude peak (with the opposite sign) at each side of the peak corresponding to 
the event. These extra peaks will move when we change frequency bands, and 
consequently they do not correspond to characteristic phases. Second, we can 
note that each line event will produce one magnitude peak in the line response, 
and two peaks in the edge response. The peaks in the edge response, however, 
are not consistent over scale. Instead they will move as we change frequency 
bands. This phenomenon can be used to sort out the desired peaks. 



4 Extracting Characteristic Phase in ID 



Starting from the line and edge filter responses at scale i, qe^i, and we now 
define three phase channels: 



= max(0,9e,i) 


(4) 


P 2 ,i = max(0, -qe,i) 


(5) 


Pz,i = abs((?o,i) 


(6) 



That is, we let pi^i constitute the positive part of the line filter response, 
corresponding to 0 phase, p 2 ,i, the negative part, corresponding to tt phase, 
and P 3 ^i the magnitude of the edge filter response, corresponding to ±7t/2 phase. 

Phase invariance over scale can be expressed by requiring that the signal at 
the next lower octave has the same phase: 
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pi^i = max(0,ge,i • Qe.i-i/ai-i) • max(0, sign(qe.i)) (7) 

P 2 ,i = max(0,ge.i • qe,i-i/ai-i) ■ max(0, sign(-ge,i)) (8) 

P3^i = max(0, qo,i ■ qo,i-i/ai-i) (9) 

The first max operation in the equations above will set the magnitude to 

zero whenever the filter at the next scale has a different sign. This operation 
will reduce the effect of the ringings from the filters. In order to keep the mag- 
nitude near the characteristic phases proportional to the local signal energy, 
we have normalized the product with the signal energy at the next lower oc- 
tave Ui-i = ^Jql + ql The result of this operation can be viewed as a 
phase description at a scale in between the two used. These channels are com- 
pared with the original ones in figure 3. 




Fig. 3. Consistent phase in ID. {pi = tt/4) 

Pi,i, P2,i, P3,i according to equations 4-6 (dashed), and equations 7-9 (solid) 



We will now further constrain the phase channels in such a way that only 
responses consistent over scale are kept. We do this by inhibiting the phase 
channels with the complementary response in the third lower octave: 



cyi = max(0,pi,i - aabs(q'o,i-2)) (10) 

C2,i = max(0,p2.i - o;abs(q'o,i-2)) (11) 

C3,i = max(0,p3,i - a abs{qe,i- 2 )) (12) 

We have chosen an amount of inhibition a = 2, and the base scale, pi = tt/4. 
With these values we sucessfully remove the edge responses at the line event, and 
a the same time keep the rate of change in the resultant signal below the Nyquist 
frequency. The resultant characteristic phase channels will have a magnitude 
corresponding to the energy at scale i, near the corresponding phase. These 
channels are compared with the original ones in figure 4. 

As we can see, this operation manages to produce channels that indicate lines 
and edges without any unwanted extra responses. An important aspect of this 
operation is that it results in a gradual transition between the description of a 
signal as a line or an edge. If we continuously increase the thickness of a line. 



Sparse Feature Maps in a Scale Hierarchy 



191 




10 20 30 40 50 60 70 10 20 30 40 50 60 70 10 20 30 40 50 60 70 



Fig. 4. Phase channels in ID. {pi = tt/4, a = 2) 

Pi,i, P2,i, P3,i according to equations 4-6 (dashed), and equations 10-12 (solid) 



it will gradually turn into a bar that will be represented as two edges. This 
phenomenon is illustrated in figure 5. 
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Fig. 5. Transition between line and edge description, {pi = tt/4) 
Top: Signal Center: ci,i phase channel Bottom: C 3 _i phase channel 



5 Local Orientation Information 

The filters we employ in 2D will be the extension of the lognorm filter function 
(equation 1) to 2D [6]: 



Fki(u) = Ri(p)Dk(u) 



(13) 



Where 



Dk(u) 



(u • fifc)2 if u • fik > 0 
0 otherwise 



(14) 



^ Note that the fact that both the line, and the edge statements are low near the 
fourth event (positions 105 to 125) does not mean that this event will be lost. The 
final representation will also include other scales of filters, which will describe these 
events better. 
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We will use four filters, with directions fii = (0 1)*, 112 = ( -s/ChS -s/ChS)*, 
113 = (10)*, and 114 = (-s/ilS — -s/fhS)*. These directions have angles that are 
uniformly distributed modulo tt. Due to this, and the fact that the angular 
function decreases as cos^ the sum of the filter-response magnitudes will be 
orientation invariant [6]. 

Just like in the ID case, we will perform the filtering in the spatial domain: 



ifeM * Pfc^)(x) « Re(IDFT{Fki(u)}) (15) 

{foM * Pfc^)(x) « Im(IDFT{Fki(u)}) (16) 

Here we have used a filter optimization technique [7] to separate the lognorm 
quadrature filters into two approximately one-dimensional components. The fil- 
ter pfci(x), is a smoothing filter in a direction orthogonal to fife, while /e,fei(x), 
and /o,fei(x) constitute a ID lognorm quadrature pair in the fife direction. 

Using the responses from the four quadrature filters, we can construct a local 
orientation image. This is a complex valued image, in which the magnitude 
of each complex number indicates the signal energy when the neighbourhood 
is locally one-dimensional, and the argument of the numbers denote the local 
orientation, in the double angle representation [6]. 



2 :(x) = ^ aki{fiki + iuk2Y = - a 3 i(x) + i(a 2 i(x) - a4i(x)) (17) 

k 



where Ofei(x), the signal energy, is defined as Uki 




6 Extracting Characteristic Phase in 2D 

To illustrate characteristic phase in 2D, we need a new test pattern. We will use 
the ID signal from figure 5, rotated around the origin (see figure 6). The image 
has also been degraded with a small amount of Gaussian noise. The signal to 
noise ratio is 10 dB. 

When extracting characteristic phases in 2D we will make use of the same 
observation as the local orientation representation does: Since visual stimuli can 
locally be approximated by a simple signal in the dominant orientation [6], we 
can define the local phase as the phase of the dominant signal component. 

To deal with characteristic phases in the dominant signal direction, we first 
synthesize responses from a filter in a direction, n^, compatible with the local 
orientation.® 



fiz = (R.e(V^) Im(v^))* (18) 

® Since the local orientation, z, is represented with a double angle argument, we could 
just as well have chosen the opposite direction. Which one of these we choose does 
not really matter, as long as we are consistent. 
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Fig. 6. A 2D test pattern. (10 dB SNR) 



The filters will be weighted according to the value of the scalar product 
between the filter direction, and this orientation compatible direction. 



Wk = (19) 

Thus, in each scale we synthesize one odd, and one even response projection 
as: 



qe,i = y] Qe,i,k ahs{wk) 

k 


(20) 


Qo,i — ^ ^ Qo,i,k'^k 


(21) 



k 



This will change the sign of the odd responses when the directions differ 
more than tt, but since the even filters are symmetric, they should always have a 
positive weight. In accordance with our findings in the ID study (equations 7-9, 
10-12), we now compute three phase channels, ci^i, C 2 y, and in each scale. 

The characteristic phase channels are shown in figure 7.® As we can see, the 
channels exhibit a smooth transition from describing the white regions in the 
test pattern (see figure 6) as lines, and as two edges. Also note that the phase 
statements actually give the phase in the dominant orientation, and not in the 
filter directions, as was the case for CPCT [4]. 

7 Local Orientation and Characteristic Phase 

An orientation image can be be gated with a phase channel, c„(x), in the fol- 
lowing way: 



The magnitude of lines this thin can be difficult to reproduce in print. However, the 
magnitudes in this plot should vary just like in figure 5. 
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Fig. 7. Characteristic phase channels in 2D. {pi = tt/4) 

Left to right: Characteristic phase channels ci,i, C2,i, and cs,i, according to equations 10- 
12 (q = 2) 






0 if c„(x) = 0 

otherwise 



( 22 ) 



We now do this for each of the characteristic phase statements ci_i(x), C 2 ,i(x), 
and C 3 _i(x), in each scale. The magnitude of the result is shown in figure 8. Notice 
for instance how the bridge near the center of the image changes from being 
described by two edges, to being described as a bright line, as we move through 
scale space. 



8 Concluding Remarks 

The strategy of this approach for low-level representation is to provide sparse, 
and reliable statements as much as possible, rather than to provide statements 
in all points. 

Traditionally, the trend has been to combine or merge descriptive components 
as much as possible; mainly to reduce storage and computation. As the demands 
on performance are increasing it is no longer clear why components signifying 
different phenomena should be mixed. An edge is something separating two 
regions with different properties, and a line is something entirely different. 

The use of sparse data representations in computation leads to a mild increase 
in data volume for separate representations, compared to combined representa- 
tions. 

Although the representation is given in discrete scales, this can be viewed 
as a conventional sampling, although in scale space, which allows interpolation 
between these discrete scales, with the usual restrictions imposed by the sampling 
theorem. The requirement of a good interpolation between scales determines the 
optimal relative bandwidth of filters to use. 
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Fig. 8. Sparse feature hierarchy, {pi = {tt/2, tt/4, tt/8, tt/16}) 
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Abstract. In this paper estimation and extraction of complex articu- 
lated rotational motion is addressed using the mathematical framework 
of geometric algebra (GA). The basis of the method outlined here is the 
ability to optimise expressions with respect to rotors (the quantities that 
perform rotations) and rotational bivectors - something which would be 
very difficult to do with conventional representations. As an application 
of these techniques we then look at the problem of tracking in optical 
motion capture and give some results on real and simulated data. It will 
also be illustrated how the same mathematics can be used for model 
extraction and inverse kinematics. 

Keywords: Rotations, geometric algebra, articulated motion, motion 
estimation, motion modelling, tracking. 



1 Introduction 

Estimating rotational motion in 3D is important in many real-world applications; 
e.g. tracking, human motion analysis, biomechanics, robotics and animation. It 
is often the case in such applications that substantial effort is spent on finding 
efficient methods for representing and estimating rotations. In this paper we 
will show that the rotor formulation used in geometric algebra has distinct ad- 
vantages over more conventional representations in terms of both efficiency and 
consistency. In particular, the multivector calculus, which enables us to differen- 
tiate wrt any element of the algebra (rotors, bi vectors etc.) allows us to optimise 
over the correct manifolds. 

2 An Introduction to Geometric Algebra 

There are now many good introductions to GA; detailed treatments of GA can be 
found in [2], [1] and briefer introductions in [4], [6]. A tutorial introduction with a 
GA software package called GABLE , which is used extensively for simulations 
in this paper, is available at [7]. Here, we therefore give only the briefest of 
introductions. In what follows we adopt the convention that vectors will be 
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written as lower case roman letters (not bold, except where we discuss state and 
observation vectors in the tracking) while multivectors will be written as upper 
case roman letters. 

Let Qn denote the geometric algebra of n-dimensions - this is a graded lin- 
ear space. As well as vector addition and scalar multiplication we have a non- 
commutative product which is associative and distributive over addition - this 
is the geometric or Clifford product. A further distinguishing feature of the 
algebra is that any vector squares to give a scalar. The geometric product of two 
vectors a and b is written ab and can be expressed as a sum of its symmetric 
and antisymmetric parts 



ab = a-b + aAb, (1) 

where the inner product a-b and the outer product ciAb can therefore be defined in 
terms of the more fundamental geometric product. The crucial point here is that 
the geometric product is invertible unlike either the inner or outer products. 

The inner product of two vectors is the standard scalar or dot product and 
produces a scalar. The outer or wedge product of two vectors is a new quantity 
we call a bivector. We think of a bivector as a directed area in the plane con- 
taining a and 6, formed by sweeping a along b. The outer product of k vectors 
is a /c-vector or /c-blade, and such a quantity is said to have grade k. A gen- 
eral multivector, A, is made up of a linear combination of objects of different 
grade, i.e. 



A — (A)o -l- (A)i -|- {A)2 -I- . . . -I- {A)n (2) 

where (A)^ is a pure r-blade. GA provides a means of manipulating multivectors 
which allows us to keep track of different grade objects simultaneously - much 
as one does with complex number operations. For a general multivector A, the 
notation (A) will mean take the scalar part of A. A is reversion and tells us to 
reverse the order of all vectors in A, i.e. (abc) = cba. 

We can multiply together any two multivectors using the geometric product 
and we can also define the inner and outer products between arbitrary multi- 
vectors as the following grade-lowering and grade-raising operations: A^ ■ Bg = 
(Ari?s)|^_j,| and ArABg = (Aj.i?s)p+s|- Of particular interest to us in this pa- 
per are the GA quantities which perform rotations. It can be shown [8] that 
in any dimension, the multivectors which represent rotations can be written as 
R = ±e^ where B is a pure bivector representing the plane in which the rotation 
occurs and R satisfies RR = 1. R is called a rotor and has a two-sided action, 
i.e. a vector a will be rotated into a vector a' where a' = RaR. We can see here 
that the concept of rotation takes the same form in any dimension and that we 
can rotate any quantities, not simply vectors, e.g. R{aAb)R = (RaR) A{RbR). 

We now have an algebra whose basic elements are these ‘geometric’ entities 
of different dimension. It is also possible to show that there is a well-defined 
linear algebra and calculus framework on the algebra. 
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3 Review of Geometric Calculus 

We will see in later sections that when we formulate our models in terms of 
rotors we are led to optimisation processes over rotor and bi vector manifolds. 
In order to carry out such optimisations we require some general results from 
geometric calculus which we discuss in this section. 



3.1 Differentiation with Respect to Multivectors 



If AT is a mixed-grade multivector, X = ^^Xr, and F{X) is a general 
multivector-valued function of X, then the derivative of F in the A ‘direction’ is 
written as A * dxF{X) (here we use * to denote the scalar part of the product 
of two multivectors, i.e. A* B = (AB)), and is defined as (with r a scalar) 



A * dxF{X) = lim 

r^O 



F{X + tA) - F{X) 

T 



(3) 



For the limit on the right hand side to make sense, A must contain only grades 
which are contained in X and no others. If X contains no terms of grade-r and Ar 
is a homogeneous multivector, then we define A^ * dx = 0. We can now use the 
above definition of the directional derivative to formulate a general expression 
for the multivector derivative dx without reference to one particular direction. 
This is accomplished by introducing an arbitrary frame { ej } and extending this 
to a basis (vectors, bivectors, etc..) for the entire algebra, {ej}. Then dx is 
defined as 



dx = '^e-’{ej * dx) (4) 

,7 

where {e'^} is an extended basis built out of the reciprocal frame (if {ci} is a 
basis for the space then {e^}, where Ci-e^ = Sj, is the reciprocal frame). The 
directional derivative, ej * dx, is only non-zero when ej is one of the grades 
contained in X (as previously discussed) so that dx inherits the multivector 
properties of its argument X. 

Another useful notation is 



/(A) = (A * dx) F{X) (5) 

where / is termed the differential of F and is a linear function of A acting at X 
- here we see the derivative as an approximation to the locally linearised form 
of the function. 



3.2 The Taylor Expansion of Multivector Quantities 

The Taylor expansion is useful in deriving identities in multivector calculus. 
Let X, A and F{X) be as above and r be a scalar, then the Taylor expansion 
of F(X -I- tA) is defined as 



X r 



F{X + rA) = ^ - [(A * dx)’^ F{X) 






(6) 
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where {A*dx)'^F{X) = [(A * ... (A * 3x) [(^ * 9x) -F(-^)]]- Here 

{A * dx)^ is a grade preserving operator therefore resulting in correct grade- 
matching in equation (6). It can easily be verified that this form gives the correct 
expansion for some simple cases (e.g. use F{X) = X^ with X a bivector). 

3.3 The Chain Rule for Multivector Differentiation 

In any practical analysis the use of the chain rule is essential. For multivector 
differentiation it is formulated as follows. Define 

H{X) = G[F(X)]. 

Using equation (3) and equation (6) the chain rule for multivector differentiation 
can be expressed [8], [12] as 

(H * dx) H{X) = [{(H * dx) F{X)} * dp^x)] G[F{X)] (7) 

Writing equation (7) in terms of differentials 

h{A) = g_U_{A)) (8) 

we have the remarkable result that when expressed in this way the chain-rule 
has a very simple intuitive form. 



4 Rotor Estimation 

In many tracking and computer vision problems it may be desirable to represent 
the set of rotations that describe the motion as a time evolving state. Suppose 
we have a vector which is undergoing some rotational motion in 3D and a series 

D 

of measurements. If Rn = e where i?„ and are the rotor and bivector 
representing the rotation at time t = n, then it will generally be possible to write 
down Kalman-filter type equations governing the evolution of the state and the 
observations. If is the ith {i = I, . . . ,L) observation vector at time n and 
is some estimate of v'‘ at the previous time-step, then one possible model 
is that our system evolves so that the changes in the rotational bivectors are 
small and the rotors approximately rotate m* to v'^, i.e. 

Bn = Bn-l + ABn (9) 

'^n ~ Rn'U'n-lBn + Qn ( 1 *^) 

where ABn is essentially the error in our smoothness model and ql^ is the obser- 
vation error. Since bivectors always square to give a scalar, we can write Bn = 
6 Bn with B^ = — 1, a unit bivector. 

Given the state (9) and observation (10) equations we may now construct a 
weighted least squares approach in which we minimize the errors in these two 
equations. Under the assumption of Gaussian errors and subject to certain other 



Estimation and Tracking of Articulated Motion Using Geometric Algebra 201 



conditions, [9], such an approach would be optimal. Consider the minimisation 
of the following cost function: 



C = 



■ L 

E 

.2=1 



« • 9^) 






( 11 ) 



Where Wi,Wo are weighting factors. Under certain conditions this is the same 
cost function as used in the Kalman Filter [10] correction at a given time step. 
Essentially we have only a correction part and no prediction phase; however the 
simplicity of this formulation may be quite adequate for the particular tracking 
applications we will discuss and has been applied successfully in a number of 
similar cases [11]. 

Differentiation of this expression with respect to Bn involves some straight- 
forward but involved manipulation (details of the steps involved can be found 
in [12]) eventually leading us to a bivector equation for the least squares solution 
to Bni 



\—ri{Bn) + lBnri{Bn)RnBnRn 

Wi V 0 \ 

2—1 ^ 



sin(0) 

WiO'^ 



BnBi{Bn)RnBr, 



■+0ri(Bn)Rn 



{Bn — Bn-1 

Wo 



) 



( 12 ) 



where Fi{Bn) = u'n^ A RnVnRn and 0 = \Bn\ = y/—B^. Equation (12) can be 
solved in practice either by simple iterative techniques or by some optimization 
method such as the Levenberg-Marquardt algorithm [13]. Note here that we have 
found a least squares solution to the problem without introducing any coordi- 
nates or particular representation (Euler angles, direction cosines etc.) for the 
rotations - this guarantees that our solution will not depend on the coordinate 
frame that we choose to numerically solve our equations in. When we are talking 
about the evolution of a rotor field we are really talking about the evolution of 
the bivectors representing that field, it therefore seems natural that we should 
work with the bivectors as the basic quantities. Indeed it has been shown, [3], [14] 
that for rotational invariance we must interpolate rotations by interpolating the 
representative bivectors. It is also our view that fewer singularities will occur in 
this bi vector approach. 



5 Application to Articulated Motion 

We now look at applying the mathematics of the previous section to the case 
where the rotations describe the movement of an articulated body. We begin by 
setting up the problem in the required form. Let us begin by considering the 
simple model of a number of linked rigid rods joined by revolute joints (with 
all rotational degrees of freedom) . Let cc" be the position vector of the ith joint 
at time n, and define = x'Yi — Let i?f be the rotor which represents 
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rotation about the ith joint at time n. If the motion at the ith joint is constrained 
so as only to allow rotations about a unit axis h^, then Rf would be a rotation 
of 9i about the rotated fii axis ( = Rf_i...RifiiRi...Rf_i). In figure 1 we would 
therefore be able to write, say 3 ^i ^2 “ which can be 

generalised to give 






But it can be shown that [12] 






= RJl 



nn jyn nn / r, 

0^2,0 ■ • ■ 



\ pn pn p' 

• ■ • -^2,0 



R^ 



,o> 



(13) 



where i?”p is about the fii axis (i.e. previous rotations not being applied). We 
therefore see that the bivectors corresponding to the individual rotors are now 
independent. 



5.1 Articulated Rotor Estimation 



As in Section 4, we can now postulate an evolution model of the form 



Ko = B 



2,0 



AB 



2,0 



(14) 



„n pn pn pn pn ^n— i pn pn p' 

^^ 2 , 2+1 ~ '^ 1 , 0 '^ 2,0 • • • -^ 2 - 1 , 0 -^ 2 , 0 *^ 2, 2 + l '^ 2 , 0 '^ 2 - l,0 ' * ’ “^ 2,0 



pn 

-^1,1 



and an appropriate cost function (as in equation 11). Thus we are able to dif- 
ferentiate wrt each of the bivectors and iteratively solve the resulting equations 
as outlined in the previous section. We note here that if there are constraints 
at any of the joints (which, in practice is often the case), such constraints are 
easily allowed for in the model. 



6 Results 

In order to check the validity of the expressions above they were incorporated into 
a tracking problem. Here a real world articulated object was observed through 
a system of calibrated perspective cameras with arbitrary orientations and posi- 
tions. Firstly the algorithms were applied to simulated data and secondly to real 
data taken with a 3-camera system. For the simulations we are able to compare 
the results with the known ‘truth’, whereas for the real data the performance 
of the algorithm was judged by its ability to track and reconstruct the object 
reliably. 



Estimation and Tracking of Articulated Motion Using Geometric Algebra 203 



6.1 Simulations 

An articulated model which has four linked points with total rotational free- 
dom at each point (except the last point) was simulated. Errors were added 
at the camera-plane level. The model was simulated with the dynamical be- 
haviour described by (14). Bivector values, lengths and the initial positions of 
the model were assigned randomly. The values and the projection of the 

observation noise onto the image plane were taken as pseudo-random sequences 
with Gaussian pdfs of variances a fraction of the maximum rotational bivec- 
tor and a fraction of the maximum link length, respectively. Two experiments 
with the variance pairs given by (0.001, 0.00001) and (0.01, 0.01) were conducted 
and these two experiments will be referred to as simull and simul2. Note that 
independence of the distributions wrt each other and wrt different time steps 
was assumed in these simulations. Although the assumptions about indepen- 
dence and Gaussianity do not strictly hold in practice, these formulations are 
generally a reasonable approximation [10], [11]. In the above experiments, cor- 
respondences in the image planes were achieved by a global point assignment 
scheme [15]. Reconstruction of the 3D position was performed using the algo- 
rithm given in [5]. Gorrections to the bivectors were calculated using (12) and 
these corrected bivectors were used to predict the world position at the next 
time instance. Only two iterations over the number of links were carried out, 
estimating each rotor/bivector value associated with each link. Several different 
weighting factors (wi,wo) were used and were assumed to be the same for ev- 
ery link. These weighting pairs were (1, 1), (1, 50) and (50, 1) respectively. A set 
of experiments that only uses the previous observations as current predictions 
has results labelled as (0,0). The GA software package GABLE [7] was used 
for the simulations. The sum of the squared difference between the actual world 
points and the predicted world points for each time instant (sdijff), the sum of 
link lengths at each time instant (slink) and the length of a single example link 
(slength) are plotted in Figure 2. 

From Figure 2 we can see that the weighting factors do not affect the results 
significantly for given variances. As expected the model preserves the lengths 
of the links. When the previous prediction is used as the current prediction, 
the lengths of links in the reconstruction change considerably, so we see that 
although it appears that this method gives smaller values of sdiff, it does so by 
violating the physical constraints of the system. 

6.2 Real Data 

The algorithms were tested on two sets of real optical motion capture data taken 
with a 50Hz 3-camera system; these data sets, referred to as ‘golfl ’ and ‘jlwalkS’ 
represent the arm movements in a golf swing, and a person walking respectively. 
The captured data consisted of a number of bright points in each image, these 
coming from retroreflective markers on the subject. 

Here the tracking algorithm used was as described for the simulated data but 
observed world points were used rather than real world points (now unknown). 
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Fig. 2. First column shows the results of simulated data with variance pairs 
(0.001,0.00001) and (0.01,0.01) which has three plots for each variance pair. 
These correspond to sdiff, slink and slength plotted against frame number. Each 
graph contains plots for the different weighting schemes as shown in the legend. 
The 0_0 line indicates no estimation is performed and the previous observation 
is used as the current prediction. The second column has the results from real 
data sets ‘golfl’ and ‘jlwalkS’. These also have two sets of three plots as in the 
simulated case 



Initial rotor values were estimated using the method described in [.5], using 3 
frames and manually identifying the correct association of links in images. Same 
quantities, sdiff, slink and slength are plotted in Figure 2 except now for the 
calculation of sdiff the reconstructed world points (i.e. the world observation) 
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Fig. 3. Left-hand figure shows the magnitude and the direction of the bivector 
and the direction of the path taken by the normal unit vector to the bivector 
over time for the rotor at the elbow joint for ‘golfl’ ((1,1) weighting used). 
The right-hand figure is similar but shows the results for the knee joint rotor in 
‘jlwalkS’ 



from the current image planes were used instead of the actual world point, which 
is of course, unknown. 

It was found that in this real-data case, in order to achieve reliable tracking 
it was necessary to weight the prediction term more heavily. Again, it was found 
that using the previous observations as the current prediction led to poor recon- 
structions due to its inability to preserve the physical model. Overall the method 
performed well on the given data sets and was able to track and reconstruct all 
of the motions considered. 

The advantage of estimating rotational bivectors is evident in Figure 3. It 
shows the rotational bivector estimated at the elbow and knee joints in ‘golfl ’ 
and ‘jlwalkS’ respectively. It is possible to gain much more information from 
figures like these than one would get from scalar plots of Euler angles. For 
example, from figure 3 it can be seen that the forearm has rotated roughly 
through the same angle between each frame wrt the elbow since the size of the 
bivectors shown (see the circular disc) have approximately the same diameter. 
We also see that the direction of the forearm in this estimated model changes 
smoothly over time and has a graceful transition when the arm starts to swing in 
the opposite direction. It is also interesting to look at the bi vector evolution for 
the knee rotor; it is clear that in this case both the magnitude and orientation 
of the rotation plane change much more. This is indicative of real changes in the 
orientation of the knee and could present us with a useful visualisation tool for 
investigative studies on gait. 
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7 Conclusions and Future Work 

In this paper we have discussed the use of geometric algebra in modelling evolving 
systems which can be described by rotors. In particular, the issue of tracking 
has been addressed where a least-squares technique which employed multivector 
calculus on the defining bivectors, was addressed. Examples of applying the 
resulting technique to real and simulated data were shown which illustrated 
the validity of the method. The concepts described may have many uses in 
motion analysis. In particular, as shown in figure 3, an important application 
of the techniques may be to extract relevant rotors from pre-tracked motion 
data - this can be used to understand specific motions, detect abnormalities 
etc. The method outlined has set up the forward kinematics for a particular 
articulated model - we also envisage that the rotor formulation will be very 
useful for inverse kinematics (inferring the state from the observations) which 
is of vital importance in control systems for walking robots etc. 
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Abstract. In this paper we consider all imaging systems that consist of 
reflective and refractive components -called catadioptric- and possessing 
a unique effective viewpoint. Conventional cameras are a special case of 
such systems if we imagine a planar mirror in front of them. We show that 
all unique viewpoint catadioptric systems can be modeled with a two- 
step projection: a central projection to the sphere followed by a projec- 
tion from the sphere to an image plane. Special cases of this equivalence 
are parabolic projection, for which the second map is a stereographic 
projection, and perspective projection, for which the second map is cen- 
tral projection. Certain pairs of catadioptric projections are dual by the 
mapping which takes conics in the image plane to their foci. The foci of 
line images are points of another, dual, catadioptric projection; and vice 
versa, points in the image are foci of line images in the dual projection. 
The proved unifying model for all central catadioptric projections gives 
us further insight to practical advantages of catadioptric systems. 



1 Introduction 

In the past decade the vision community has seen a resurgence in the intelligent 
design of imaging sensors. It has been recognized that perspective cameras are 
not necessarily best suited to most tasks. Many tasks require constant and si- 
multaneous omnidirectional vision. Vision for robotics, immersive telepresence, 
videoconferencing, and mosaicing, are all examples of tasks in which this is 
the case. A popular solution has been to use the already available technology 
(perspective or approximately orthographic lenses and rectangular CCD arrays) 
in combination with properly designed mirrors, thereby achieving the goal of 
omnidirectional vision. It is often desirable to choose systems whose locus of 
viewpoints is a single point. In doing so the complexity of interpreting the in- 
formation obtained is reduced, and in addition it is possible to generate a more 
natural (to us) equivalent perspective image in an arbitrary direction (assuming 
that the image transformation is known and calibrated). Nayar et ah has studied 
the various configurations and shapes of mirrors which yield a single effective 
viewpoint. In general they are formed by a combination of mirrors which are 
surfaces of revolution and whose cross-section is a conic, and a perspective or 
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orthographically projecting camera. They have shown that any combination of 
these mirrors and type of camera is equivalent to a single conical mirror and 
camera. 

Let us briefly summarize recent activities on omnidirectional vision. A panor- 
amic fleld of view camera was first proposed by Rees [9]. After 20 years the con- 
cept of omnidirectional sensing was reintroduced in robotics [12] for the purpose 
of autonomous vehicle navigation. In the last five years, several omnidirectional 
cameras have been designed for a variety of purposes. The rapid growth of mul- 
timedia applications has been a fruitful testbed for panoramic sensors [5,6,8] 
applied for visualization. Another application is telepresence [10, 1] where the 
panoramic sensor achieves the same performance as a remotely controlled rotat- 
ing camera with the additional advantage of an omnidirectional alert awareness. 
Srinivasan [2] designed omnidirectional mirrors that preserve ratios of elevations 
of objects in the scene and Hicks [4] constructed a mirror-system that rectifies 
planes perpendicular to the optical axis. The application of mirror-lens systems 
in stereo and structure from motion has been prototypically described in [11,3]. 
The fact that lines project to conics is mentioned in the context of epipolar lines 
by Svoboda [11] and Nayar [7]. 

In this paper we investigate the geometric properties induced in the image by 
a single viewpoint catadioptric projection. We show that in all cases the projec- 
tion is equivalent to a parameterized projection of a sphere. Here we mean that 
a point in space is first projected to the sphere from the center, then projected 
from a point on an axis to an image plane perpendicular to this axis; the posi- 
tion along this axis is the parameter. This gives us a canonical representation 
of any catadioptric projection. In particular, in the case of a parabolic mirror, 
this is equivalent to projection from the pole of this axis, known as stereographic 
projection. Reflection by a planar mirror, i.e. perspective projection, is equiva- 
lent to projection from the center of the sphere. This enables us to more easily 
determine the set of conics which are images of lines, and also their invariants, 
by determining the images of great circles by these projections. 

In addition we show that each of these catadioptric projections has a dual. 
The mapping between a projection and its dual is that which returns the foci of 
a conic. The foci of a line image are points in its dual projection and points in 
an image are foci of line images in the dual projection. 

We will show that due to the nature of the geometry one may calibrate any 
catadioptric system in a single frame with as few as two lines except in singular 
cases where only three are necessary for the parabola, or impossible with a 
perspective camera. 

2 Connection with the Projection of the Sphere 

We will first provide formulas for point projection via parabolic and hyperbolic 
mirrors, then demonstrate their equivalence with projection of the sphere. Then 
we shall describe the induced projective geometries and their constituent points 
and lines. 
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2.1 Parabolic Projection 




Fig. 1. Cross-section of a parabolic mirror. The image plane is throngh the focal point. 
The point in space P is projected to the antipodal points P' and P" , which are then 
orthographically projected to Q' and Q” respectively. 



Assume that a regular paraboloid is placed so that its focal point is the 
origin, and its axis is the 2 :-axis, see figure 1. Let 4p be the diameter of the circle 
obtained by intersection of the paraboloid and the plane z = 0 (4p is the latus 
rectum). The projection of a point is determined by the orthographic projection 
to 2 ; = 0 of the intersection of the line through the point P and focal point F. 
When the point lies on the 2 -axis, t he image is a single point, otherwise the 
image is a set of two points. If r = the point (s) of intersection 

are 

2pa; ^ 2py ^ 2pz \ 

\ z J ' 

The orthographic projection yields, 

2pa; ^ 2py \ 

\ z J ' 

Let X be the set of all such images of points, then Qp ^ X is defined by 



qp{x,y,z) 



2pa; ^ 2p^ \ 
\ zj 



2.2 Hyperbolic and Elliptical Projections 

Again assume the hyperboloid (of two sheets) is placed so that its focal point 
Fi is the origin and axis is the 2 -axis, see figure 2. Assume the other focal point 
F 2 is placed at (0, 0, —d) and the latus rectum is 4p. The projection of a point 
P is defined as the perspective projection, from F 2 to the plane 2 = 0, of the 
intersection of the line through P and Fi. If the Y is the set of image points. 
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Fig. 2. Cross-section of a hyperbolic mirror, again the image plane is through the focal 
point. The point in space P is projected to the antipodal points P' and P” , which are 
then perspectively projected to Q' and Q” from the second focal point. 



and again r = + 2/^ + then the hyperbolic projection rp^d : ^ F is 

defined by, 



( 2xdp 2ydp \ 
rpAx,y,z) = ±-r— — — , 
\ ar T az dr^azj 



where a = \/cP + 4p2. Notice that the limit of this map as d approaches -Poo is 
the map Qp, as expected. 

Assuming the foci of an ellipse are F\ = (0,0,0) and F 2 = (0, 0, — d), then 
the map defined. 



r'pAx,y,z) 



/ 2xdp ^ 2ydp 
\ dr ±az' dr ±az 



is the elliptical projection function. Note that rpAx,y,z) = rpAx,y,—z). So 
hyperbolic and elliptical projections where p and d are equal differ only by a 
reflection by the z = 0 plane. 



2.3 Perspective Projection 

Perspective projection may be viewed as a special case of catadioptric projection; 
for perspective projection is unchanged, except by viewpoint, by the presence 
of a planar mirror. Assume that the focal point of a perspective projection is 
located at (0,0, —2/), where / is the focal length. Also assume that the image 
plane is at 2 : = — /, but also within the same plane lies a planar mirror. The 
equivalent focal point is then (0, 0, 0). The projection is a function p/ : -> , 

and unlike the projections mentioned above maps a point in space to a single 
image point. It is given by 

Pf(x,y,z) = • 
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2.4 Spherical Projection 

Let a sphere of unit radius be centered at the origin. A point {x, y, z) is mapped 
by central projection (figure 3). We now wish to project the antipodal 

pair (±^, ±^, ) from the point (0, 0, 1), where 0 < / < 1, to the plane z = — m, 

m 7 ^ —1. We achieve this with the map defined by 



si,m{x,y,z) 



f x{l + m) y{l + m) \ 
Irzfz' Ir^z )' 



If we change the plane, the projection differs only by a scale, as 




Fig. 3. A point P = (x, y, z, w) is projected via s to two antipodal points 

(±x,±iy,±z)/r on the sphere. The two antipodal points are projected to the image 
plane 2 : = —m via projection from the point (0, 0, 1). 



2.5 Unification 

How are Qp, Xp^d, Pf, and related? It is a simple matter of substitution to 
prove the following. 

Theorem 1. Projective Equivalence. Catadioptric projection with a single 
effective viewpoint is equivalent to projection to a sphere followed by projection 
to a plane from a point. 

Proof. We have the following relationships for the projection functions: 
rp,d(x,y,z) = s i d(i- 2 p) {x,y,z) 

Vd2+4p2 ’ V<J^+4p2 

r' dix, y, z) = s a a(i-2p) (x, y, -z) 

■\/d^+4p^ ’ \/d^+4p'^ 

qp{x,y,z) = sigp-i{x,y,z) 

Pf{x,y,z) = soj{x,y,z) 

The first corresponds to hyperbolic projection, the second to elliptical projection, 
the third to parabolic projection, and the fourth to perspective projection. □ 
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Now let I and m be chosen. We are now able to prove the following, where 
the fronto-parallel plane is the plane z = 0 and its horizon is the image of the 
points at infinity contained by this plane. 

Proposition 1. The image of a line in space is a conic which intersects the 
fronto-parallel horizon antipodally. 

Proof. A line in space is projected to a great circle on the sphere. To determine 
the projection of this great circle to the image plane we determine the cone 
whose vertex is the point of projection and which goes through the great circle. 
The intersection of this cone is the line image and is clearly a conic, being a 
section of a cone. 

The fronto-parallel horizon is the projection of the equator and is a circle 
whose center is the image center and whose radius is Any great circle 

intersects any other great circle antipodally, and therefore also the equator. The 
projection of their intersection is also antipodal on the fronto-parallel horizon. 

If the normal of the plane containing the great circle is h = (ux,ny, Uz) then 
the conic has the following characteristics. 




(I -P m)nx {I -P m)ny 
Uz T Vl - ’ riz T Vl -P 



) 



l(l -P m)nz 
P-nl- nl 




( 1 ) 



where f± are the foci, a is the minor axis, and b is the major axis. Notice that 
the foci lie on a line through the origin (image center). □ 

Let 

il = {(±a;, ±y, ±z) j z^ = 1} 

and let 

A = {[±nx, ±Uy, ±nj I -P -P = 1}, 

where 

[ux,ny, nz] = {(x, y,z) Gll\ xn^ -P yny -P znz = 0}. 

Then II is the set of antipodal point pairs on the sphere, and A is the set 
of great circles, where each [nx,ny,nz] is the set of points on the great circle. 
Now let 7T/,m = (s/,m(i7),s/,m(A)). 7T/,m is the projective plane generated by 
si^m, and we call it a catadioptric projective plane. Let i7(7T/,m) = si^m{n) and 
A{r^i^m) — ^/,m(A). 



3 Duality 

Let (nx,ny,nz) be the normal of a plane; we found that the foci of the line 
image are given by equation 1. Notice that this is the projection of the same 
point (nx,ny,nz) by where I' = Vl — P and m' = I -\-m — Vl — P- 
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Before giving the theorem on duality we define the operators meet (V) and 
join (a), then we will give a proposition. Let p = {x\,y\,z\) and q = (x2,y2,Z2) 
be points on the sphere. The great circle through them has normal [p x q]. The 
image line through them is then the image of this great circle. Let pAqhe this 
“line”. Similarly, if p = [m] = [mx,my,mz] and q = [n] = [nx,ny,nz] are the 
normals of two great circles, let p V <7 be the the point mxn, i.e. the intersection 
of the two lines. The composition of these operators with a projection from the 
sphere to the image plane yields binary operators in the catadioptric projective 
plane. For example, ii p,q £ II{Tri^rn), then 

P V (? = (s;7^(p) V s7^((?)) . 



Proposition 2. Let {[wj]} be a set of line images all of which intersect a point 
h, i.e. for all i, fhi -h = 0. Then the locus of foci of the line images lie on a conic 
whose foci is the same as the point h. 

Proof. Because of rotational symmetry, we may assume without loss of generality 
that Uy = 0. This implies that 

{ [fhi] } = {[-Hz sin 0i , cos 6i , sin 6i] } . 



Then the foci are, 

r± = 



(I + m)nz cos 6 {I + m) sin 6 
Ux cos 6 4 = Vl — P ' rix cos 6 \/l — P 



But this is just one of the points m* projected by 



and one finds that the second focus is the second projected point. Therefore this 
point is in the image of the line h by this same projection. Its foci are 



V 4= / / 



Uz T ^ 



which is the projection of h by LI 

We use this proposition to prove the following theorem on duality. 

Theorem 2. If iri^m = and = {Tl 2 ,M) are two catadioptric 

planes such that 



f' + /'^ = 1 and I + m = 1' + m', 



then fi^m which gives the foci of a line image in the context of some catadioptric 
plane maps as follows, 



fl,m : 2li — > iT 2 

'■ A2 ^ III 
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and their inverses exist. In addition, incidence relationships are preserved by 
fl,m- 

AP2) = ffJm'iPi) V //7™/(P2), 

^2) = (h) A (h)i 

where pi,p2 C IIi and l\,l2 C 2I2. We therefore call the projective planes, TTi^m 
and , dual catadioptric projective planes. 

Proof. We have already shown the first part of the theorem, it only remains 
to show that incidence relationships are preserved. But this follows from the 
proposition and the fact that incidence relationships are already known to be 
preserved on the sphere by the mapping taking antipodal points to great circles 
and vice versa. □ 




Fig. 4. The two ellipses are projections of two lines in space. Their foci FI, F2, and 
Gl, G2 respectively lie on a hyperbola containing the foci of all all ellipses through U 
and V. The foci of this hyperbola are the points U and V. 



4 Consequences 

4.1 Stereographic Projection 

Stereographic projection is a mapping from the sphere to the plane which is 
conformal; angles between great circles on the sphere are mapped to circles which 
have the same angle between them. This means that the horizons of planes which 
are perpendicular to each other will be orthogonal. It also implies the following. 
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Corollary 1. In a stereo pair of parabolic catadioptric systems, angles between 
epipolar line images are constant and equal to the angles between the planes 
containing the points. 



4.2 Calibration 



The geometric properties discovered here prove to be very useful in calibration. 
The first is the invariant property of each line image, namely that every line 
image intersects the fronto-parallel horizon antipodally. This is extremely useful 
because the line image can be used as the cross section of a surface of revolution 
about its major axis. Then every surface so created, either a hyperboloid, ellip- 
soid, or sphere, always contains the points (0, 0,±2p). Therefore intersection of 
at least three of these surfaces will yield the image center and focal length. 

The second useful property is duality, by which we are able to determine 
the second intrinsic parameter, d, in the case of the hyperbolic mirror. When 
a line image is a conic, the dual points are its foci, and through these points 
there exists a line image in the dual projection whose foci are on the original 
conic. This second line image corresponds to the great circle perpendicular to 
that of the original line image, and their intersection are points on the equator. 
With a second line image in the dual projection for every original line image, 
we calibrate both the original projection and the dual projection simultaneously, 
under the assumption that the image centers are identical, using the intersection 
of surfaces described above. Together this pair of calibrated systems will encode 
both d and p, as well as the image center. 

However, using this algorithm, it becomes very difficult to differentiate be- 
tween hyperbolic mirrors, where d is large, and parabolic mirrors. For as d in- 
creases, the hyperboloid sheets obtained from the line images in the dual projec- 
tion will become more and more planar and estimates of d will become less and 
less precise. Nevertheless, in general, it is possible to perform calibration with 
only two line images. 

In the parabolic case we need a minimum of three line images. This is neces- 
sary, because, as we will see, the dual projection, which is perspective projection, 
cannot be calibrated with any number of lines in a single frame. In this case the 
surfaces of revolution are spheres whose equators are line images. Three spheres 
when intersected yield p and the image center, and the third, d we already as- 
sume to be infinite. In particular the intersection of those spheres will be the 
points (0, 0,±2p). 

It is impossible to calibrate a perspective camera with lines in a single frame 
because each line introduces two unknowns (its orientation) and two equations 
(its position in the image), and thus the number of unknowns, which is at first 
three (focal length and image center), never decreases. There are zero constraints 
on the system, and no matter how many lines we obtain we will not be able to 
calibrate the perspective camera with lines in a single image. 
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5 Conclusion 

Let us review the key points which we have introduced here. 

1. Every single viewpoint catadioptric system is equivalent to central projection 
to a sphere followed by projection from a point on the sphere’s axis. In 
particular, parabolic projection is equivalent to stereographic projection, and 
is therefore conformal. 

2. To every catadioptric projection there is a dual, the mapping between pro- 
jections is by taking focal points of line images. This mapping preserves 
incidence relationships. 

3. Calibration of a catadioptric projection is possible with only two lines, and 
in general three, except for perspective projection. In the parabolic case, cal- 
ibration is performed by intersecting spheres whose equators are line images. 

The natural next step is to extend this theory to multiple catadioptric views 
as well as a study of robustness of scene recovery using the principles described 
herein. 
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Abstract. The description of the relation between the one-parameter 
groups of a group and the differential operators in the Lie-algebra of the 
group is one of the major topics in Lie-theory. 

In this paper we use this framework to derive a partial differential equa- 
tion which describes the relation between the time-change of the spectral 
characteristics of the illumination source and the change of the color pix- 
els in an image. 

In the first part of the paper we introduce and justify the usage of con- 
ical coordinate systems in color space. In the second part we derive the 
differential equation describing the illumination change and in the last 
part we illustrate the algorithm with some simulation examples. 

Keywords: group theoretical frames in robotics, vision and neurocom- 
puting, non-linear metrics and linearization. Lie-theory, color vision, in- 
variance 



1 Introduction and Notations 

Color constancy is traditionally treated as a static problem which in a general 
form can be described as follows: Given the image of a scene R under 
illumination = I{R,l^°'>) and another illumination estimate the 

image of the same scene under the new illumination = I{R, In the 
most general case only the image data and the illumination is given. The 
estimation of the relation between the scene and the illumination is part of the 
problem to be solved. Many approaches to this problem have been suggested but 
in this general form the problem remains unsolved today. The method described 
in this paper differs in one essential point from these traditional approaches by 
assuming that we can observe the scene under a continuously changing illumi- 
nation: I{t) = I{R, From this image sequence we estimate the parameters 
which describe the evolution of the changing illumination condition. This can 
then be used to compensate the influence of the illumination change by comput- 
ing a stable image of the scene which is independent of the illumination change 
under consideration. 

* Reiner Lenz was supported by a grant from the CENIIT program at Linkoping 
University and the grant “Signal Processing with Transformation Groups” financed 
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After the introduction of conical coordinate systems in color space we will 
show that the change of the color vector in a given image location is described by 
a partial differential equation with polynomial coefficients. The unknown curve 
parameters which control the illumination change enter these equations linearily 
and can thus be recovered by statistical estimation techniques. 

The outline of the paper is as follows: in the first part we introduce conical 
coordinate systems as natural descriptors of color spaces. Then we use these 
coordinates to derive the partial differential equation which describes the influ- 
ence of the illumination changes. In the last section we illustrate how to use 
this description to estimate the dynamic properties of the dynamically changing 
illumination properties. We will use the following notations: bold letters denote 
vectors s and matrices AT, A is the wavelength variable and the identity matrix 
will be denoted by E. 



2 The Conical Structure of Spectral Color Spaces 



Reflectance spectra measured from color chips of the Munsell and NCS color 
systems can be described by linear combinations of a few basis vectors. Usually 
the eigenvectors are taken as these basis vectors [1,2,5,6,8,11,13]. For a general 
spectral vector s(A), basis vectors and coefficients ak this gives: 

K 

8(A)«^cTfe6W(A) (1) 

k^l 

Collecting the coefficients ak in the vector cr, the 6(^)(A) in the matrix B 
Equation (1) becomes: s « Bcr. Usually cr is treated as an element in . 
In [4] we showed however that these vectors cr are located in a cone C = 
{((To, (Ti, CT 2 ) : (Tq — crj — (T 2 > 0} . There we used a database consisting of re- 
flectance spectra of 2782 color chips, 1269 from the Munsell system and the rest 
from the NCS system. The eigenvectors computed from the database are col- 
lected in the matrix Bklt- The first three eigenvectors and the distribution of 
the coefficient vectors are shown in Figure (la) and (lb). The conical structure 
of the eigenvector space is a special case of the following theorem: 

Theorem 1. For every system of N + 1 basis vectors B of unit length such 
that min^fe® > 0 we can find positive constants bn,n = 1...N such that the 
coefficients an = satisfy: 

CTg - bial - h2a\ - ... - b^aj^ > 0 



for all spectral vectors s. The coefficient vectors of all spectra are therefore located 
in a cone. 



To see this observe that 



\{b,s)\ = 



b{X)s{X)dX 



6(A)&(°)(A) 

&(o)(A) 



s{X)dX 



< b J b^°\X)s{X)dX (2) 
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Fig. 1. (a) The first three eigenvectors computed from the combined Mun- 
sell/NCS database, (b) The distribution of the expansion coefficients of the 
spectra in the Munsell/NCS database in the basis shown in (a) 



where b = maxA|6(A)|/6*^°^(A) is finite since the first basis function is non-negative 
everywhere. In the following we will always assume a scaling of the basis functions 
which allows us to select &„ = 1 for all n. Basis systems for which this theorem 
is valid include the eigenvector system Bklt shown in Figure (la) but also 
the CIE-XYZ system if we select = Y. From the positivity of the first 
basis vector follows that the coefficient (Tq is related to the intensity of the 
corresponding color spectrum. The expression 

IKII = ^0 ~ ~ ^2 ~ ~ (3) 

on the other hand is a measurement of the “grayness” of the spectrum. In [4] it 
was shown that ctoj <7i +cr| and arctan ^ have an intuitive explanation as inten- 
sity, saturation and hue when the basis system is computed from the eigenvectors 
of the spectral database mentioned above. 

The grayness value ||cr|j in Equation (3) can be used as a length measure since 
it is always non-negative. The linear transformations which preserve the grayness 
form the group SO(l,iV). In the following we will only use N = 2 for which 
there are three basis functions. From general theory it is known that SO(l, 2) is 
essentially the same group as SU(1, 1) which contains all complex 2x2 matrices 
of the form 

M = with \af - \bf = 1 (4) 

Application of the elements in SU(1, 1) transform only the chromaticity part of 
the colors. Therefore it is necessary to combine it with the positive real numbers 
and define the group SU(1, 1) = R+ x SU(1, 1) as the direct product. It consists 
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of all pairs where p S R and M G SU(1,1). The group operation is 

defined as: 



(pi, Ad'i) O (p2, AI 2 ) — (pi + P2 , -^iM2) (5) 

where M 1 M 2 denotes ordinary matrix multiplication. 

3 One Parameter Groups and Differential Operators 

Given a matrix of basis vectors B and a spectrum s the expansion coeffi- 
cients CTojCn and (J 2 are obtained. Given their relation to intensity, saturation 
and hue we imitate traditional color science notations and define 

L = cto, A=—, B=— (6) 

L is related to the intensity and A, B to the chromaticity of s. For a coordinate 
vector (L, A, B) we also introduce r and z as: 

T = log L and z = A + iB (7) 

In the following we consider functions of the coordinate vector cr 

/ : (cro,Cri,(T2) /(o-o,o-i,ct2) (8) 

Partial differential operators in the cr coordinate system are denoted by Dk : 

n j:t \ ^/(o’O, cn, 0-2) 7 0 10 

Dkf(cro,CFi,cr2) = , fc = 0,l,2 (9) 

The group SU(1, 1) acts on LAB-space by (see Equation (7)): 

(p, M) (L, A, B) = (p, M) (L, z) = (e^L, Mz) = (10) 

V 0Z + aj 

There is a close connection between certain subgroups and differential oper- 
ators. In the simplest case it is obtained as follows: take the subgroup 
00 = {{p^E) : p G R} defining the mapping (T, z) 1 -^ {p, E) {L, z) = (ePL,z). 

The subgroup go is example of a one-parameter subgroup of SU(1, 1) since 
it depends only on the parameter p. This is used to define the differential oper- 
ator Dgg as: 

f^D,J=^^fiip,E){L,z))\p^o 

= (e^L, z) |p=o = (e^A, e^LA, e^LB) |p=o 

dp dp 

= LDof {L, LA, LB) + LADif {L, LA, LB) + LBD2f {L, LA, LB) 

= L{Dof + AD,f + BD 2 f) (11) 



222 



Reiner Lenz 



which gives the operator identity: 

-Dgp = L (Z?o + AD\ + BD2) = o'oDi + ciDi + (T2D2 ( 12 ) 

Using the same construction shows that each one-parameter subgroup g de- 
fines a differential operator Dg . The main property of these differential operators 
is collected in the following theorem: 

Theorem 2. 1. The differential operators Dg defined by the one-parameter 

subgroups g of SU(1, 1) form a vector space of dimension four. This vec- 
tor space is known as the Lie-algebra of the Lie-group SU(1,1) . Lt will be 
denoted by 5u(l, 1) . 

2. Four basis vectors are defined through the following one-parameter groups: 
go = M (13) 



01 




/ cosh ^ sinh f \ \ 1 

l^sinh ^ cosh §J J j 



{(0,Mffa))} 



02 

03 




/ cosh f i sinh f \ \ 1 

\^— i sinh Y cosh f J J j 



{(0,M2(a))} 




{(0,M3(o))} 



3. The corresponding differential operators are: 



DgQ — crgDo -f CTiDi -f a2D2 



(14) 

(15) 

(16) 



(17) 



^91 — 



2(Tn 



■-D, - 



(7iCT2 



D 2 






'92 



do 



2ao 



-Do 



(18) 

(19) 



Dg^ — —aoDi -\- aiDo ( 20 ) 

More information about this construction and other basic facts on Lie-groups 
and Lie-algebras can be found in books on Lie-theory (such as [10,7,12,14]). 

Each one-parameter subgroup g of SU(1, 1) defines a curve in (cro,ai,a 2 ) 
space and by differentiation an operator Dg. This is an element of the Lie- 
algebra su(l, 1 )^ and thus there are constants oq, . . . ,03 such that: 



Dg — a^Dg^ -\- a\Dg^ -t- aoDg2 -t- a^Dg^ (21) 

Now assume that the function f {ao{t),ai{t),a 2 {t)) describes our measure- 
ments varying over some period of time. Assume furthermore that this variation 
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(over a sufficiently small period of time) originates in a one-parameter subgroup g 
of SU(1, 1) . From these measurements we can compute the derivative 






(22) 



The operator Dg is an element of the Lie-algebra and thus there are con- 
stants ao, . . . ,03 for which 



df {ao{t),ai{t),a2{t)) 
dt ' 



( = 0 — Dgf 



+ aiDg^f + tt2Dg^f + a^Dg^f 

(23) 



All of the quantities Dgf, Dg^,f, (fc = 0, . . . ,3) can be computed from the mea- 
sured data and the unknown constants oq, . . . ,03 can therefore be estimated. 

Sometimes it is useful to do the same calculations in LAB-space as described 
in the next theorem: 

Theorem 3. A basis of the Lie-algebra su(l,l)^ in LAB eoordinates is given 
by the operators: 



ddgo = LDl 



D 



(1 _ ^2 + Db 



01 



~ 



- (AB) Da+{1 + A^- B^) Db 



Dg^ — —BDa + ADb 

where Db, Da, Db are the differential operators: 

dg{L,A,B) dg{L,A,B) 

Dl9= rrp , Da9 = , Db9 



dL 



dA 



(24) 

(25) 

(26) 
(27) 



dg{L,A, B) 
dB 



4 Application to Illumination Invariant Recognition 

We will now show how the general theory can be used to recover character- 
istic illumination parameters from a sequence of images taken under changing 
illumination conditions. 

Our simple model of the imaging process combines the illumination spec- 
trum l{\), the reflectance spectrum at position x given by r{x,X) and the 
spectral characteristic of an imaging sensor (for example a camera) c(A) as: 



TOk (a;) 



I (A) r (x, A) (A) dA 



(28) 
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where (a;) is the value measured with sensor number k at position x. In most 
cases the {x) are the measured R, G, B values and (A) is the spectral 
sensitivity of the R, G or B channel. 

We use different basis vectors (A) , (A) , = 0,1,2 in the spaces of 

object reflectance and illumination spectra. For a distribution r (a;. A) and an 
illumination spectrum I (A) we get the coordinate vectors (a;) , crO) leading 
to the approximation: 

2 2 

r (a;,A) « (A)CT^(a;), Z (A) « ^ (A) cr^ (29) 

iy—0 f-L—0 

Inserting these approximations into Equation (28) gives: 

2 2 

(a?) « ^ ^ erf,’’) (a;) cr^ 

U—0 f_L—0 

= E E (30) 

1^—0 fi —0 

The known matrix characterizes channel k of the sensor. The coordinate 
vectors (a:) , are in general unknown. The estimation of the coordinate 
vector(s) of the reflectance spectra involved is not the topic of this paper and 
therefore we assume that we can estimate the vectors v^, (x)' = (x)' 

A general discussion of this type of bilinear calibration-estimation problems can 
be found in [3]. 

Simplifying notations we avoid the superscript and get for the measure- 
ments: 



1 6«(A)6« (A)cW(A) d\ 



(cr) = v^,{x)'cT = {v^(x),(t) (31) 

The measurements considered as functions of the illumination define special cases 
of the functions defined in Equation (8). We write 

/(cro,cri,(T 2 ) = /K.cc(o-0,cri,cr2) = (v„(a:),cr) = (cr) (32) 

Observing the same scene point with the same camera under changing illumina- 
tion conditions produces a measurement series (cr, t) which is the raw input 
data. ^From this we compute the time derivative 

, dm^ x{rr,t) 

= — (33) 

Theorem 1 shows that the mapping m m' defines a differential operator 
which is a linear combination of the known differential operators with un- 
known coefficients a^. The algorithm to recover the unknown illumination pa- 
rameters is thus as follows: 
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1. Select a number of points in the image 

2. For each point/sensor combination tt = (x, n) measure the sequence 
/«,a;(CTo(i),CTl(^),Cr2(t)) = fn 

3. For each point/sensor combination tt compute the derivative 

, n r (cro(i),cri(t),CT2(t)) I 

= DgU = |t=o 

4. Collect all values in the vector M = (m^). 

5. For each point/sensor combination tt compute the derivatives 

— '^kTT 

for fc = 0, . . . ,3 and collect them in the matrix U . 

6. Between each row of U and the corresponding element in M there 
is a relation 

ttItt = UT^a 

where the vector a collects the unknown coefficients oq , . . . , 03 . 

7. The unknown coefficient vector can be estimated by solving in a statistical- 
sense the equation 

M=Ua (34) 

5 Experiments 

In the experiments we tested the algorithm by simulating a sequence of im- 
ages which show the same scene under changing illumination conditions. These 
experiments show that two factors are of importance in the application of the 
framework developed so far. 

— The first factor is the selection of basis vectors in the space of illumination 
and reflectance spectra. It is of advantage to use different basis systems in 
both spaces since prior knowledge about the general nature of the spectra 
involved can be used in the selection of these basis vectors. 

~ Furthermore it has to be decided how the estimates from different points 
and different channels should be combined. Treating all estimates equally 
ignores the fact that some estimates are less reliable than others. 

In the simulations we use multispectral images described in [9] . For a given 
multispectral image we use the spectral characteristics of a CCD-camera to sim- 
ulate the color image captured by the camera. The basis vectors in the space of 
reflection functions are the eigenvectors computed from the Munsell and NCS 
systems described above. The basis vectors in the space of illumination spectra 
are the eigenvectors derived by generating a random mixture of 1000 spectra 
consisting of the CIE-light sources A, B, C, D65, a flat spectrum, 5 measured 
daylight spectra and 3 artificial daylight spectra. 
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In a typical experiment we generate a time varying series of illumination 
spectra as follows: at time t = 0 the illumination light is characterized by the 
flat illumination spectrum corresponding to white light. At time t = 1 the il- 
lumination is given by a pre-defined spectrum. We then simulate the changing 
illumination condition by connecting these two spectra by a one-parameter sub- 
group of SU(1, 1) . In a common simulation we use the CIE-A source as the light 
source at time t = 1. An image sequence consists of 3 to 10 frames corresponding 
to time parameters in the range t = 0.2, . . . 1.1. 

In the experiment illustrated in Figure (2a) we use 3 frames with time pa- 
rameters t = 0.2, 0.3, 0.4. The light source at t = 1 is the A-source and in the 
estimation the 10 random points were tracked. The scene used is named “inlab- 
2” in the database. In the figure four spectra are shown. The original spectrum 
is shown as dotted line. The basis vectors for the space of illumination spectra 
was computed by a principal component analysis of 1000 randomly generated 
spectra as described above. The approximation of the original spectrum of the 
A-source in this basis system is shown as solid line in the figure. This is the 
best we can achive within the chosen system. The difference between the dot- 
ted and the solid line originates in the difference between the three-dimensional 
PCA-based description and the original. 

The dashed and the dotted-dashed lines show two estimations from the im- 
age sequence. The dashed estimation was obtained using the mean over all es- 
timations. In that case all observations are treated equally. The dashed-dotted 
estimation was obtained using a weighted sum. For each point x we compute 
first the determinant of the matrix formed by the vectors Vf^{x) introduced in 
Equation (31). Low values of this determinant indicate that small variations in 
the measurement vector can lead to large deviations in the estimation of the 
parameter vector. Estimations based on observations of such a point are thus 
considered unreliable and weighted down. The figure shows that in this case the 
weighting leads to a considerable improvement of the estimation result. This is 
confirmed by the results shown in Figure (2b). In this series of simulations we 
used three input scenes (Inlab2, Ashton2 and Rwood) and tracked 5 points over 
three frames. This was done 100 times for each scene. The estimated spectrum 
was then compared with the approximation of original spectrum of the A-source 
in the coordinate system used in the simulation. The errors for the mean-based 
estimation were sorted and are shown as the solid line in Figure (2b). The cor- 
responding errors for the estimation based on the weighted estimation is shown 
as an ’x’ for each simulation. The diagram shows a clear improvement of the 
estimation results for the weighted mean based experiments. 



6 Conclusions 

We demonstrated the use of Lie-techniques in the estimation of illumination spec- 
tra from dynamical image sequences. This illustrates one application in which 
methods from the theory of Lie-groups and Lie-algebras can be used in the anal- 
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(a) 10 points tracked (b) Mean/Weighted estimation 



Fig. 2. (a) Estimation results using Inlab2 with 10 points tracked over 3 frames 
(b) Comparison of estimation errors obtained by mean and weighted mean esti- 
mation 



ysis of time- varying image sequences. Other problems involving different groups 
like the euclidean motion group can be analysed along the same lines. 
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Abstract. The work reported in this paper apphes group theory to 
characterize local surface contact among sohds, and makes such a for- 
mahzation computationally tractable. This paper serves three purposes: 
(1) to report a new treatment of contacting surfaces as oriented sets and 
to verify the consistency of their corresponding symmetry groups in a 
group theoretical formahzation framework; (2) to give a concise sum- 
mary of this group theoretical approach, from theory, to algorithms, to 
appheations; (3) to pinpoint some unsolved problems and possible future 
directions. 



1 Introduction 

Contact motion analysis of rigid bodies is one of the most useful yet difficult 
topics in robotics, design, assembly planning and manufacturing. Increasing level 
of automation demands more effective computational method for representing 
and reasoning about contacts, and transforming high-level specifications to low- 
level executable commands. Admittedly, contact analysis is “a computational 
bottleneck in mechanical design” , “it is especially challenging for curved parts 
with multiple, changing contacts” [5, 6]. The current state of art in this area has 
left much to be desired. The input to almost all the reported automatic assembly 
planning systems, such as [24,25, 16,4], is one-static-state of the final assembly 
configuration regardless the assembly is meant to be rigid or articulated. Most 
work in contact analysis is dealing with planar surfaces [22,17,7]. While the 
most impressive work on higher pairs^ analysis and simulations [5,21,2] still 
need human intervention, there exist no formal theory and algorithms for contact 
analysis of rigid bodies in general. 

Herve [3] and Popplestone [19] are among the few pioneers who contributed 
to the group theoretical formalization of mechanical engineering practice. Herve 
has introduced a rational classification of mechanisms by applying the theory 
of continuous groups. Since each lower-pair allows a set of relative motions of 
two coupled bodies, these motions can be regarded as subgroups of 6^ . The 

^ Lower pairs are mechanical joints that have surface to surface contact (area con- 
tact). otherwise they are called Higher pairs (hne/curve/point contact) in mechan- 
ical engineering. 
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number of independent variables required to define the relative position of two 
coupled links is referred to as their degree of freedom, which can be extended to 
a subgroup of ^1+ in 3-space, the corresponding concept being that of dimension. 
In [3] , Herve gave a table (Table III) listing the intersection and composition of 
continuous subgroups of the proper Euclidean group. Only pairs of continuous 
subgroups are considered. In [19] Popplestone relates robotics and group theory, 
in particular he pointed out that 1) the symmetry group of a feature (of a solid) is 
perhaps more important than that of the whole solid, and 2) not only continuous 
groups (as treated in [3]) but also hnite and discrete groups should be handled. 

In Liu’s Ph.D. thesis [9,11], the group theoretical formalization of surface 
contact among solids was further extended and solidihed. The novelties in her 
work include, (1) both hnite and continuous groups are treated under a uniform 
general formalization; (2) a simple geometric algorithm for combining differ- 
ent symmetry groups of local contacting surfaces, to determine relative motions 
of the contacting solids, is developed and implemented with proven tractabil- 
ity. (3) the approach carries through: starting from CAD models of solids, fol- 
lowed by automatic determination of relative locations/motions of multi-contact- 
assembly-parts, to the generation of assembly procedures for robots [12-14]. Liu’s 
notations and algorithms can deal with symmetry groups which is itself a semidi- 
rect product of a continuous group and a discrete group, thus providing the power 
of reasoning about both degrees of freedom and combinatoric conhgurations of 
contacting solids simultaneously. A list of some important canonical subgroups 
of ^1+ treated in [9] is given in Table 1. Further development after [9] includes. 



Table 1. Some important subgroups of 



Canonical 1 
Groups 


Definition 


Qid 


{1} 




gp{trans(0, 0, z)\z G 


r" 


gp{trans(r, y, 0)|r, ?/ € 


r" 


gp{trans(r, y, z)\x, y,z 




gp{rot(i,6')rot(j,a-)rot(k, b)|^,cr, b G 


6'0(2) 


gp{rot(k, 9)19 G 


0(2) 


gp{rot(k, 9)rot(i, n7r)|9 G 5k, n G Af} 


Qoyl 


gp{trans(0, 0, «;)rot(k, 9)rot(i, n7r)|n G A”, 9, G 5k} 


Qdir-cyl 


gp{trans(0, 0, «;)rot(k, 9)|«;,9 G 5k} 


Qplane 


gp{trans(r, y, 0)rot(k, 9)|r, ?/, 9 G 5k} 


^ screw (p) 


gp{trans(0, 0, «;)rot(k, 2«;7r/p)|«; G 5k} 


^TiC2 


gp{trans(0, 0, «;)rot(i, n7r)|n G ff, v G 5k} 


D2n 


gp{rot(k, 27r/n)rot(i, mTr)\m, n G A”} 


Cn 


gp{rot(k, 27r/n)|n G A”} 




gp{trans(r,j/, «;)rot(i,9)rot(j,cr)rot(k, </>)!*:,?/, v,9,cr, </> G 5k} 
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transforming high-level spatial relation descriptions to low-level compliant mo- 
tions [15]; treating each oriented surface on a solid as a primitive feature; and 
obtaining an explicit expression for general contact relationships in terms of the 
symmetry groups of the contacting surfaces. The last two parts are described 
in this paper. We are now ready to go beyond surface contact, to venture into 
more general algebraic surfaces, higher pairs and group products. Different from 
the study of solids in local contact, e.g., [8, 18], our aim is to have a precise and 
complete description of the intended, possibly articulated, hnal assembly conhg- 
uration of solids where each part usually has multiple contacts with the rest of 
the assembly; and our approach is algebraic in nature. Also different from [20, 
23] in that a group theoretical formalism is embedded in a concise representation 
of contact motions not involving extensive algebraic equation manipulation. 

2 Basic Group Theoretical Formalism of Contact 
Analysis 

Since contacts among solids happen via the contacts of the surfaces of the solids, 
the representation and characterization of each surface constitutes the founda- 
tion of any formalization for solid contacts. 



2.1 Oriented Surface and Its symmetry Group 

The surfaces which we have treated mathematically as subsets of iff® [9, H] have 
no intrinsic inside and outside. To remedy this we introduce the concept of 
oriented features by dehning a set of outward-pointing normal vectors for each 
surface point of a solid. The polynomial used to express an algebraic surface 
implicitly dehnes such normal vectors. Let be the unit sphere at the origin 
embedded in iff®, each point of corresponds to a unit vector in iff®. 

Definition 2.1.1 An oriented primitive feature F — {S,p) of a solid M is 
an oriented surfaee where 

1) S (Z is a eonneeted, irredueible^ and eontinuous algebraic: surfaee whieh 
partially or eompletely eoineides with one or more finite oriented faees of M ; 

2) p C S X is a eontinuous relation. For eaeh s Z S if s is a non-singular 
point of surfaee S (p.78 [1]) then v Z is one of two opposing normals of 
the tangent plane at point s sueh that [s, v) £ p; if s is a singular point of S 
(e.g. at the apex of a eone) then, for all v, where v Z is the limit of the 
orientations of its neighborhood, (s, v) £ p. 

3) For all s £ M, (s, v) £ p, v points away from M . 

Intuitively speaking, a feature is composed of both “skin” , S, and “hair” , the 
set of normal vectors which correspond to the points on S^. Each element of 

^ Here irreducible implies that a primitive feature cannot be composed of any other 
more basic surfaces. 
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relation /o is a correspondence between a point on S and a vector on S^. Note, 
there may be more than one ‘normal vector’ at one point of a surface, e.g. at 
the apex of a conic shaped surface. 

Let be the proper Euclidean group which contains all the rotations and 
translations in ijf®, and T® be the maximum translation subgroup of 

Definition 2.1.2 Any isometry g — tr of ,t £ T®,r £ 50(3) acts on p in 
such a way that [s, v) £ p {gs, rv) ^ g * p. 

Definition 2.1.3 An isometry g £ 5"*“ is a proper symmetry of a feature 

F — (5, p) if and only if g{S) — S and g * p — p. 

Note, there is an extra demand on a symmetry for an oriented feature — it 
has to preserve the orientations p of the feature as well as the point set 5. Since 
orientations are points on 5^, symmetries of an oriented feature have to keep 
two sets of points in ijf® setwise invariant. One can prove® that the symmetries 
for an oriented surface form a group. 



2.2 Multiple Contacts: Compound Surface Features and Their 
Symmetry Groups 

An assembly or mechanism is a manifestation of multiple surface interactions 
of its subparts, albeit the physical property of each individual part (rigid or 
deformable) or the nature of the contact (static or articulated). Thus the rep- 
resentation of an assembly or mechanism is reduced to how to specify a set of 
contact constraints which dictate the conhguration of a set of solids. Given two 
solids Bi and B 2 in contact via surfaces F\ and F 2 respectively, the relative 
motions of B 2 respect to B\ can be expressed as: 

lf% (E fiGiG2ff\ ( 1 ) 

where is the relative position of solid 2 w.r.t. solid 1, G±, G 2 are symmetry 
groups of Fi and F 2 respectively, li,l 2 specify the locations of solids in 

the world coordinate system and fi and /2 specify the locations of Fi , F 2 in their 
respective body coordinates. A more specihc form for the relative positions of 
two solids under n surface contacts (two contacting surfaces coincide): 

if% e fiGff^ (2) 

has shown clearly that the possible motions of a solid or a subassembly 5 in an 
assembly are described precisely by the symmetry group G of the multiple con- 
tacting oriented surfaces of 5. Note, G here is usually not the symmetry group 
of the whole solid/subassembly 5 but the symmetry group of those surfaces of 5 
that are in contact with other solids. If G is an identity group, i.e. = fiff^ 
gives a hxed position for 5. If G is a hnite rotation group, then fiGff^ contains 

® Due to space hmit, we only give results without proofs. Interested readers can find 
details in the references listed. 
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a finite number of positions refiecting the existence of discrete symmetries in the 
collection of contacting surfaces. If G is a continuous group then there exists 
relative continuous motions between S and the rest of the assembly. It is then 
the degrees of freedom of such a configuration can be spoken of. 

Let us first give a denotation for such a collection of contacting surfaces 
from a single solid, and then determine what effect the symmetry group of this 
surface-set will impose on the relative motions of contacting solids. 

Definition 2.2.1 A compound feature F — {S,p) of primitive features Fi — 
{Si,Pi), ...,Fn — (Sn,Pn), is defined to be S — S\U ...U Sf and p = pi U ... Up„. 

Pairwise Relationship of Oriented Features In order to determine the 
symmetry group of a compound feature systematically, we start with the simplest 
compound feature — a compound feature composed of only one pair of primitive 
features. See Figure 1 (a), 1 (b) and Figure 1 (c) for examples of these simple 
compound features ( Note that only a finite face on each primitive feature is 
drawn) . 

Given a pair of primitive features, what kind of relationship holds between the 
two features and what is the effect of such a relationship in terms of determining 
their symmetry group? The following definition gives a characterization of four 
relationships between a pair of primitive features: 

Definition 2.2.2 Two oriented primitive features Fi — {S'i,pi),F2 — {S'2,P2} 
are said to be 

— Distinct: if for any open subsets S[ Q S\, S'2 F S2, rio g — tr (E. exists 
such that g{S'i) C S2 or ^( 5 ^) C S\. See Figure 1 (a) for an example of a pair 
of distinct features F\,F2. 

— 1-congruent: if there exists at least one g £ 6 '^ such that g{S\) — S'2 and 
g * Pi — P2, but for all such g,g{S'2) 7^ S\. For an example see Figure 1 (b). 
Another example is two parallel planar surfaces with normal vectors pointing 
in the same direction. 

— 2-congruent: if there exists g^ G such that gciFi) — S'2 , gc{S'2) — S'i,gc* 
Pi — p2 and gc * P2 — Pi- For an example, consider two parallel cylindrical 
surfaces having the same radius and normal vectors pointing away from their 
center lines, as in Figure 1 (c). Also, two parallel planar surfaces with normal 
vectors pointing to the opposite directions serve as examples of a pair of 
2-congruent features. 

— Complementary: if there exists g £ such that g(Si) — S'2 and g 
Pi — —p2 where —p2 — {(s, — v)|(s, v) G P2}] in. other words, V(s, v) £ 
g pi, 3 (s, —v) £ p2, and V(s, v) £ P2, 3 (s, —v) F g * pi- See Figure 1 (d) for 
an example. 

It is easy to verify that these relationships are symmetrical relations. Imme- 
diately we can prove that this characterization has exhaustively enumerated all 
the possible cases between a pair of oriented primitive features. 
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Proposition 2.2.3 Distinct, 1 -congment, 2 -congment and complementary are 
the only possible relationships between a pair of primitive features. 

Corollary 2.2.4 Exvept for a pair of planar surfaee primitive features, distinct, 
1 -congruent, 2 -congruent and complementary relationships are mutually exelu- 
sive relations between a pair of primitive features. 

Proposition 2.2.5 If features Fi — {S'l, pi), F2 — ( 52 , P2) ore eomplementary 
of eaeh other, where a[Si) — S'2, a £ 5 "*“, and Gi, G2 are the symmetry groups of 
Fi,F 2 respeetively, then aGia~^ — G2. In partieular, if S\ — S2 then G± — G2 

(the necessary condition for surface contact). 



Fi = (Si,a) F2 = (s,,a) 



FI = ( Si.fi) F2 = (S 2 ,fin 



Fl = (s.,?i) F2 = (S2.?D 







FI = ( Si,?i) F2 = (s, ,5>,) 




orientation vectors of 5i 
orientation vectors of ^2 



Fig. 1. Four types of surface relations between a pair of solids. 



Symmetry Group of Multiple Oriented Surfaces In the next few propo- 
sitions we shall explore how the symmetry group of a compound feature is ex- 
pressed by the symmetry groups of its component primitive features. The hrst 
case we consider is when a compound feature F is composed of n pairwise distinet 
features. 

Proposition 2.2.6 Given a eompound feature F — {S,p) of primitive features 
Fi — {S'l, pi), ■■■, Fn — {Sn,Pn) where Fi,...,Fn are pairwise distinet primitive 
features with symmetry groups G\, ...Gn respeetively. Then the symmetry group 
G of F is G — Gi n ... n Gri’ 

Proposition 2.2.7 Let a eompound feature F — {S,p) be eomposed of a pair 
of primitive features Fi — {S'i,pi) and F2 — {S2,P2) whieh are 1 -eongruent of 
eaeh other. If Gi , G2 are the symmetry groups of Fi , F2 respeetively, and G is 
the symmetry group of F then G — Gi D G2. 

Proposition 2.2.8 Let a eompound feature F — {S', p) be eomposed of a pair of 
primitive features Fi and F2 whieh are 2 -eongruent of eaeh other via g^ (Def- 
inition 2 . 2 . 2 ). If Fi — {S'i,pi),F2 — {S'2,P2) have symmetry groups G\,G2 
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respectively, and G is the symmetry group of F then G —< Pc > {G\ fl G 2 ) 
where < Pc > denotes the subgroup of generated by g^. 

In general, the symmetry group G of a compound feature F can be found 
from the intersection of the symmetry groups Gi of its primitive features. The 
only exception is when a mapping which flips a pair of 2-congruent features in 
F is also a member of G. These kinds of mappings are new symmetries that do 
not exist in any Gi and the new group they generate is a discrete group. 

2.3 Surface Contact: Computationally Tractable 

As one can observe from the proven results for surface contact, the intersection 
of symmetry groups of the primitive features is one of the crucial operations 
in determining the relative motions of contacting solids. We face two compu- 
tational problems: (1) How to denote symmetry groups, which can be hnite, 
inhnite, discrete or continuous, on computers? (2) How to do intersections of 
subgroups of on computers eihciently? We have successfully implemented an 
eihcient TR'^ subgroup intersection algorithm using geometric invariants deno- 
tation of the groups [9, 10] (Figures 2 and 3). The basic symmetry group of each 
surface of a solid is obtained by a straightforward mapping from the boundary 
(surface) hie of the solid to their respective canonical symmetry groups. The 




Algebraic domain Geometric domain 



Mapping from subgroups to geometric invariants 



Geometric operations 



Inverse of 



Fig. 2. A geometric representation, characteristic invariants, for the subgroups of the 
Euclidean group 



group theoretical formalization of surface contact has also been embedded into 
an assembly planning system KA3 (Figure 4). 

^ A symmetry group G (a set of motions) that can be divided into a translation 
subgroup T and a rotation subgroup R — a semidirect product G — TR. We call 
such a group a TRsubgroup of 5'*'. 



236 



Yanxi Liu 



Group Intersection 




This is an 0(n?) algorithm, where n is the number of countable poles. 



Fig. 3. Left: TR group intersection algorithm using translation and rotation invariants. 
Right: an example of symmetry group intersection (symmetry groups of planar and 
cyhndrical surfaces) 



Solid 1 fits solid 2, and solid 2 fits solid 3, 




Fpssible 

Assembly 

Configurations 



*C^js^sembly-niotiQii anaiygj^ 

i 



Assembly 

Configurations 



attach symmetry group 
to each surface feature 



I I^prtially 
I Ordered 
I Assembly 

I Motion 
( ' 

.Sbts 



Fig. 4. An assembly planner KA3 takes high-level task specihcations and generates 
robot executable assembly instructions. 
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Fig. 5. (A) A five-part gearbox assembly. (B) The top view of the gearbox conhguration. 
automatically computed by KA3. (C) Representation of the gearbox assembly in terms 
of contacting compound feature symmetry groups, see text for details. (D) The output 
POAMS for the gearbox assembly. Each sohd Si is associated with a sequence of 
motions in terms of the symmetry group of its contacting surfaces with the rest of the 
assembly. Notice, during disassembly each sohd is going through a 2-surface contact 
(6'0(2), assembled conhguration), 1-surface contact (Qdir-oyi) and finally, no contact 
(5'*', disassembled conhguration). 



As an example of assembly specification using symmetry groups, see Figure 
5 for a five-part gearbox. The representation of the assembly is shown in Figure 
5(C), where i — 1..4 is the symmetry group of the contacting compound fea- 
ture between solids Si and S^. Li — aiSO{2)a~^ , i — 1..4, 50(2) is a one degree 
rotation group resulted from the intersection of the symmetry group of a plane 
with that of a cylinder (the compound feature composed of two surfaces of the 
shaft of a gear). Lij — LiLj — aij S0{2)bij S0{2)cij , i,j — 1..4 indicates that the 
relative positions between gears (non-surface contact) are simply determined by 
rotations in 50(2) and some specific translations aij,bij,Cij, where the relative 
gear pitch ratio is also embedded. This representation of the gearbox (Figure 
5(C)) specifies precisely the articulated gearbox assembly. After a disassembly 
analysis by moving one solid away from this contacting network (Figure 5(C)) 
at a time, a partially ordered assembly motion set (POAMS) is found (Figure 5 
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3 General Contact and Computational Issues 




Fig. 6. (A) Solids and S2 have n parallel contacts. (B) Sohds S\, S2, ■■■Sm form a 
sequential contact chain 



We have divided the general contact motion among solids into surface contact 
(lower pairs), which we have successfully treated computationally, and the non- 
surface-contact (higher pairs). They have the following specific forms: 

1. Two solids have n surface contact, the relative position of solid 2 with respect 
to solid 1: 

ii% e (3) 

where G is the symmetry group of the compound feature composed of all 
the contact primitive features of S'l or S'2- If T is composed of n pairwise 
distinct features Fi, then G — C\Gi, where Gi is the symmetry group of Fi. 

2. Two solids have n general contact (Figure 6 (A)), the relative position of 
solid 2 with respect to solid 1: 

ll^l'2 ^ fllGiiUiG'zif^i C\ fl2Gi2iT2G22f22 ^ f^nGinTnG'Znf^n (^) 

where Gij is the symmetry group of primitive feature j of S'i and fij is its 
feature coordinates. 

3. m solids have a chaining general contact (Figure 6 (B)), the relative location 
of solid m with respect to solid 1: 



^Im e flGl2<TlG2lf2l f2G22,<^2G2,2fs,2 ■■■fm-lG{m-l)m.<^m-lGm{m-l)fm^rn-l) 

where Gij is the symmetry group of the surface on solid i in contact with 
solid j. 



Encouraged by our existing work on making the group theoretical formalization 
of surface contact (case 1. above) tractable, our goal is to seek the computational 
means to deal with the general contact using group theoretical formalization 
(cases 2. and 3.). 

Many open problems remain: (1) Can the geometric representation used for 
TR groups be used for group product? It is known that in general, a product 
of groups is not a group. For TR groups, the TR-restriction presented in [9, 
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10] provides us with a constructive way to judge from its invariants whether a 
product remains a TR group. However we do not yet know how to compute a 
product from the invariants of its subgroups. (2) Given an algebraic surface, how 
can one find its symmetry group computationally? We intend to investigate the 
possibility of using the Grbbner basis of a given polynomial to hnd its symmetry 
group. An alternative is to use semi-algebraic set to represent the contacting 
surfaces and hnd the continuous symmetries using Lie algebra®, but under this 
formalism it is not clear how to deal with discrete symmetries effectively. (3) 
Given a set of n algebraic surfaces and their respective symmetry groups, what 
is the exact algorithm to find the symmetry group for the whole set? [9-11] 
have only proved results for subcases, i.e. n distinct surfaces, a pair of 1-cong or 
2-cong surfaces. No proven result for the most general case yet exists. We are 
going to further study those compound features with more complicated inner 
structures. For example, one may dehne a concept of n-congruence on n features 
Fi . . .Fn as requiring that there exists g £ 6^ such that g{Fi) — F({ mod n)+ij 
this is a natural extension of 2-congruence. Such congruences will give rise to 
new symmetries of the compound feature. 

4 Conclusion 

This work provides a good example of applying algebra to solve a fundamental 
problem in robotics: solids in contact. It rehects both the power of group theory, 
and the effort one has to expend to make a mathematical theory computation- 
ally feasible. We establish a group theoretical formalization of general contact 
motion. The generality of this approach allows the treatment of solids in contact 
as subgroup manipulations, and provides a uniform computational platform for 
both continuous and discrete groups. The next challenge is to construct tools en- 
abling computations of higher pairs, which will make this work computationally 
complete. Our previous work has shed some light on the feasibility of achieving 
this goal. 
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Abstract. In this paper, the mathematical theory of wallpaper groups 
is used to construct a computational tool for symmetry analysis of peri- 
odic patterns. Starting with a novel peak detection algorithm based on 
“regions of dominance” , an input periodic pattern can be automatically 
classihed into one of the 17 wallpaper groups. The orbits of stabihzer 
subgroups within the group lead to a small set of candidate motifs that 
exhibit local symmetry consistent with the global symmetry of the en- 
tire pattern. We further consider affine distorted periodic patterns and 
show that each such pattern can be classihed into a small set of symme- 
try groups that describe the patterns’ potential symmetries under affine 
transformation. 



1 Introduction 

Symmetry is pervasive in both natural and man-made environments. Humans 
have an innate ability to perceive symmetry, but it is not obvious how to auto- 
mate this powerful insight. It is a continuous effort of the authors to find proper 
computational tools for dealing with symmetry. Symmetries of periodic patterns 
in a plane are of particular interest in computer vision. This is because the sym- 
metry group of a pattern is independent of scale, absolute color, lighting, density 
and orientation/position of the pattern. Periodic patterns can be found in regu- 
lar textures, indoor and outdoor scenes (e.g. brick walls, tiled floors, wallpapers, 
ceilings, clothes, windows on buildings, cars in a parking lot), or in intermediate 
data representations (e.g. periodicity analysis of human and animal gaits in the 
spatio-temporal domain) . 

A mature mathematical theory for periodic patterns has been known for over 
a century [1,2], For monochrome planar periodic patterns, there are seven frieze 
groups for 2D patterns repeated along one dimension, and seventeen wallpaper 
groups describing patterns extended by two linearly independent translational 
generators. Despite an infinite variety of instantiations, this finite set of sym- 
metry groups completely characterizes the possible structural symmetry of any 
periodic pattern. 

We have developed a computational model of periodic pattern perception 
composed of: generating the underlying translational lattice from the image of 
a periodic pattern, classifying the symmetry group of the periodic pattern, and 



G. Sommer and Y. Y. Zeevi (Eds.): AFPAC 2000, LNCS 1888, pp. 241-250, 2000. 
@ Springer- Verlag Berlin Heidelberg 2000 




242 



Yanxi Liu and Robert T. Collins 



identifying the preferred “motif” of the pattern. Our work, initially inspired by 
[7], appears to be the hrst to use the theory of frieze and wallpaper groups 
for automated analysis of periodic patterns, although there exist flowcharts and 
computer programs that allow humans to interactively generate and identify 
periodic patterns for educational purposes [8,3]. Due to space limitations, this 
paper concentrates only on wallpaper groups. Furthermore, we assume that the 
translational lattice of the 2D periodic pattern has already been extracted. The 
reader is referred to [5] to And our algorithm for performing robust lattice ex- 
traction. Figure 1 shows one sample result produced by this algorithm. 




Fig. 1. An oriental mg image and A) its autocorrelation surface, B) peaks found using 
a global threshold, C) peaks extracted using the threshold-free method of Lin, et al. [4], 
D) the highest 32 peaks from those returned by Lin, et al., E) the 32 most-dominant 
peaks found using our approach described in [5]. 



2 Symmetry Group Classification under Euclidean 
Transformations 

A 2D repeated or periodic pattern has the following property: there exists a finite 
region bounded by two linearly independent translations which, when acted upon 
by the group generated by the translations, produces simultaneously a covering 
(no gaps) and a packing (no overlaps) of the original image [7, 2j. The smallest 
such bounded region is called a unit of the pattern or lattice unit, since the 
translational orbit of any single point on the plane is a lattice. A symmetry of 
a subset S of Euclidean space is an isometry that keeps S setwise invariant. 
All symmetries of S form the symmetry group of S under composition. It 
has been proven that there are seventeen wallpaper groups (Figure 2) describing 
patterns extended by two linearly independent translational generators [7,2]. 
Mathematically, wallpaper groups are defined only for infinite patterns that cover 
the whole plane. In practice, we analyze a periodic pattern P of a finite area, 
and use the phrase “symmetry group G of P” to mean that G is the symmetry 
group of the infinite periodic pattern that has P as a finite patch. 

Figure 2 depicts unit lattices for the 17 distinct wallpaper groups (from [7]). 
Each unit is characterized in terms of its translation generators, rotation, reflec- 
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Fig. 2. The generating regions for the 17 Wallpaper groups (from [7]) 



tion and glide-reflection symmetries. The two linearly independent translations 
of minimum length are the two basic generators of each group, and they con- 
struct a lattice for the group. Even though the variety of pattern instantiations 
is endless, the underlying relationship between translation, rotation, reflection 
and glide-reflection in any 2D periodic pattern must conform to one of these 
seventeen cases. 

Since a symmetry of a 2D periodic pattern has to map the lattice associ- 
ated with the pattern onto itself, i.e., map centers of rotation to new centers 
of rotation having the same order, the only possible rotation symmetries are 
2, 3, 4, 6-fold rotations. This restriction is often referred to as the crystallographic 
restriction. Furthermore, reflection axes can only be oriented parallel, diago- 
nal, or perpendicular to the lattice translation vectors. Under these constraints, 
there are only five possible lattice unit shapes: (1) parallelogram (two groups: 
pl,p2), (2) rectangular (five groups: pm,pg,pmm,pmg,pgg), (3) rhombic (two 
groups : cm, cmm), (4) square (three groups:p4,p4m,p4^) and (5) hexagonal (five 
groups:p3,p3mi,p3im,p6,p6m). All lattice units are parallelograms. Rectangu- 
lar units have angles of 90^*. Rhombic units have equal-length edges. Square units 
are a special case of both (2) and (3) , and hexagonal units are a special case of 
(3). 

We have constructed an algorithm that can automatically classify which sym- 
metry group a 2D periodic pattern under Euclidean transformations belongs to. 
The practical value of understanding the 17 wallpaper groups is that correct pat- 
tern classification can be performed after verifying the existence of only a small 
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set of rotation and/or reflection symmetries. Table 1 lists the eight symmetries 
checked in the classihcation algorithm. It is clear that each group corresponds 
to a unique sequence of values listed in Table 1, and all are mutually exclusive 
from each other. The determination of a specihc rotation or reflection or glide- 
reflection symmetry is performed by applying the symmetry to be tested to the 
entire pattern, then checking the similarity between the original and transformed 
images. 



Table 1. Wallpaper group classihcation: numbers 2,3,4 or 6 denote n-fold rotational 
symmetry, Tx (or Dx) denotes rehectional symmetry about one of the translation (or 
diagonal) vectors of the unit lattice. “Y” means that the symmetry exists for that 
particular symmetry group; empty space means no. Y(g) denotes a glide rehection. 
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3 Extracting Representative Motifs 

Although other work has addressed detection of the translational lattice of a 
periodic pattern, ours is the hrst to seek a principled method for determining a 
representative motif. The issue here is that consideration of translational sym- 
metry alone hxes the size, shape and orientation of the lattice, but leaves open 
the question of where the lattice is located in the image. Any offset of the lattice 
carves the pattern into a set of identical tiles, but these tiles typically provide 
no computational insight, and appear nonintuitive to a human observer (Fig- 
ure 3). Choosing a good motif should help one see, from a single tile, what the 
whole pattern looks like. From work in perceptual grouping, it is known that 
the human perceptual system often has a preference for symmetric hgures. Our 
contribution in this section is to show how a small set of tiles can be chosen, in 
a principled way, such that the symmetry of the pattern fragments on them is 
maximized. 

If we entertain the idea that the most representative motif is the one that 
is most symmetrical, one plausible strategy for generating motifs is to align the 
motif center with the center of the highest-order of point symmetry in the pat- 
tern. This is the point hxed by the largest stabilizer subgroup of the symmetry 
group of the pattern. If we choose the centers of the highest order of rotational 
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Fig. 3. (A) and (B) show an automatically extracted lattice and the tile that it imphes. 
The tile is not a good representation of the pattern motif. (C) and (D) show the lattice 
position in terms of one of the three most-symmetric motifs found for the oriental rug 
image. The latter was generated automatically by an algorithm that analyzes pattern 
symmetry based on knowledge of the 17 wallpaper groups. 



symmetries, candidate motifs can then be determined systematically by enu- 
merating each distinct center point of the highest-order rotation. Two rotation 
centers are distinct if they lie in different orbits of the symmetry group, that is, 
if one cannot be mapped into the other by applying any translation, rotation, 
reflection or glide-reflection symmetries in its own symmetry group. 

Figure 3 shows an example of an automatically extracted lattice, and an 
arbitrary tile that it carves out, followed by three symmetrical tiles centered on 
2-fold rotation centers More examples and explanations can be found in [5] . 

4 Symmetry Group Classification Under AfRne 
Transformations 

When a 2D pattern undergoes a rigid transformation, its symmetry group re- 
mains. Strictly speaking, its symmetry group is conjugated by the transforma- 
tion that acts on the pattern. Since there exists a bijection between the original 
symmetry group and the conjugated symmetry group, the two groups are con- 
sidered equivalent (isomorphic). If one imagines a coordinate system fixed on 
the pattern, the translation, rotation, reflection and glide-reflection symmetries 
are unchanged under this coordinate system when the pattern is undergoing 
rigid transformations. This situation will no longer be true when the pattern un- 
dergoes a non-rigid transformation. However, certain symmetries of a periodic 
pattern may survive. 
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4.1 Wallpaper Group Transition Matrix 

If ^ is a symmetry of a 2D periodic pattern P, by the definition of symmetry 
g{P) — P. Let T{g{P)) = P(P), here P is a transformation. Then = 

T{P) =y TgT~^{T{P)) — T{P). A useful question to ask is: Does TgT~^ remain 
a symmetry of T{P)7 The answers, of course, depend on what g and T are. 
The answer is “yes” if (1) P is a similarity transformation (a proof that under 
similarity transformation, a periodic pattern remains the same in terms of its 
symmetry group) (2) P is an affine transformation and g is either a transla- 
tion (a proof that a periodic pattern remains a periodic pattern under affine 
transformation) or ^ is a 2-fold rotation; (3) ^ is a refiection (glide-refiection) 
and P is a non-uniform scaling parallel or perpendicular to g’s refiection axis. 
Relevant proofs can be found in [6]. Based on these proven results, we can con- 
struct a 17x17 wallpaper group transition matrix (Table 2) that dictates 
how the symmetry group of a periodic pattern can be transformed into other 
groups under non-rigid transformations. It turns out that only certain groups 
can be associated with a pattern under affine distortions. This matrix leads to a 
new way of evaluating a periodic pattern affine deformation: we should not only 
consider the symmetry group of the pattern as given, but also all the possible 
symmetry groups that can be associated with that pattern when it transformed 
affinely. Table 2 tells us that these transitions form well-defined small, finite or- 
bits. For example, there are two large orbits of the 17 groups: the pl-orbit and 
the p2-orbit. This comes from the fact that 2-fold rotation always survives any 
nonsingular affine distortion. Figure 4 shows one example of symmetry group 
transition as a pattern undergoes a series of affine deformations. 



4.2 Symmetry Group Classification Algorithm 

When the 2D pattern undergoes an affine transformation that preserves the 
shortest vector property, the same Euclidean algorithm (Table 1) can be applied 
for determining the lattice unit and classifying its symmetry group ^ . From Table 
2, only those entries with P need to be further checked for possible “higher 
symmetries” . 

The implementation of this idea is carried out as follows: Once the lattice unit 
is decided, the input unit lattice is simultaneously deformed into a hexagonal 
lattice and a square lattice, with the pattern deformed accordingly. Hexagonal 
and square lattices are the most symmetrical lattices, therefore these deforma- 
tions allow the most symmetrical potential patterns to form. Meanwhile, the 
original symmetries of the pattern are guaranteed to be preserved under at least 
one of these two deformations, because hexagonal and square lattices are special 
cases of the more general lattices (rhombus, rectangular and parallelogram, see 
Section 2) . The group classification procedure can then proceed in the same way 

^ When the affine distortion is so large that the nearest neighboring lattice points no 
longer form the boundary of a proper generating region, additional information is 
needed to locate the lattice unit. These include finding an axis of skewed symmetry, 
which is beyond scope of this paper. 
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Table 2. Wallpaper Group Transition Matrix 

Empty entries mean that there exist no transformations between the two groups (to 
the left and to the top). S: similarity transformation, N: non-uniform scahng J_ or || to 
all reflection axes in the group to the left, A: general affine transformation other than 
S or N, and P: possible affine transformation (pattern dependent). 
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as stated in Section 2. A diagram version of the algorithm is shown in Figure 5. 



4.3 Symmetry Group Classification Experimental Results 

We have successfully processed all seventeen wallpaper group patterns^ . Here we 
provide one example to illustrate how our algorithm works. 

The first step is to determine the underlying translational lattice structure of 
the original image, in the form of two independent generating vectors t\ and ^ 2 - 
Since we are assuming that the wallpaper pattern has been previously isolated, 
the lattice points are determined by finding significant peaks in the pattern’s 
autocorrelation surface (Figure 6a-c). The lattice of dots is decomposed into two 
generating vectors by finding the two shortest difference vectors t\ and t 2 such 
that the angle between them is between 60 and 90 degrees. The second step 
involves transforming the lattice to a square grid, aligned with the horizontal 
and vertical axes (Figure 6d-f). This is performed by applying an affine transfor- 
mation to the image and its autocorrelation surface. The transformation used is 
the unique affine transform leaving the origin (0,0) fixed and taking t\ to (L, 0) 



^ For a more complete set of results on all 17 wallpaper groups see [6]. 
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p4m cmm 




Fig. 4. The original periodic pattern has symmetry group cmm (middle). Its symmetry 
group migrates to different groups within its orbit in the wallpaper group transition 
matrix (Table 2) while the pattern is being affinely transformed. 



and t '2 to (0, L), where L is the larger of the two generating vectors lengths ||ti|| 
and ||t 2 ||. 

After transforming to a square lattice, a square generating region (with di- 
mensions L X L) is cropped from the transformed image. This is used as a 
template, the rotated and reflected versions of which are correlated with the 
transformed image to determine what, if any, type of rotation and reflection 
symmetry it has. In the location determined by the highest correlation peak, a 
match score between the rotated/rellected template and the image is computed 
as the mean of the absolute difference between corresponding intensity values. 
The lower the value of this match score, the more likely it is that the image 
has that particular rotational/rellectional symmetry. This yields a set of “typi- 
cal” match scores for that pattern - the mean and standard deviation of these 
scores are used as an adaptive threshold tailored for this pattern. Match scores 
associated with rotated/rellected templates are compared to this threshold to 
determine whether that particular symmetry holds. 

An example is shown in Figure 6. The processed values for both square and 
hexagon lattices are shown below: 





rotlSO rotl20 rot90 rotbO T1 rell T2 rell D1 rell D2 rell 
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We find that the pattern only has two-fold rotation symmetry when represented 
using a square lattice grid, which signifies group p2 (Table 1). To transform 
the image to a hexagonal lattice structure, the affine transformation is used that 
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Fig. 5. An algorithm for symmetry group classification of 2D periodic pattern under 
affine transformation: Y(ghde) means the reflection symmetry must be a non-trivial 
ghde reflection. Y(n) / N(n) means the test result is positive/negative and n is the 
possible number of symmetry groups need to be further distinguished. 



leaves the origin (0,0) fixed while mappingfi to [L, 0) and t 2 to {L/2, L^{y^/2)), 
where L is a length chosen as before. The row labeled “hexag” in the table shows 
rotation and reflection results for the hexagonally transformed pattern. We see 
that now, in addition to two-fold symmetry, the pattern also has 60 and 120 
degree rotational symmetry. The pattern is uniquely classified as being from 
the p6 wallpaper symmetry group (Table 1). One can also verify this transition 
between p6 and p2 in the wallpaper group transition matrix (Table 2) . 
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Fig. 6. (a) original image, (b) autocorrelation of image, (c) detected lattice points, 
(d) transformed image, (e) transformed autocorrelation, (f) transformed lattice points, 
now a square grid, (g) hexagonal transformed image, (h) transformed autocorrelation, 
(i) transformed lattice points, now a hexagonal grid. 










250 



Yanxi Liu and Robert T. Collins 



5 Conclusion 

We propose a computational model for periodic pattern perception based on 
the mathematical theory of crystallographic groups, in particular, the wallpaper 
groups. This mature mathematical theory provides principled guidelines for an- 
alyzing and classifying periodic patterns, and for extracting a patterns’ visually 
meaningful building blocks, namely motifs. This computational model has been 
implemented and tested on both synthetic and real-world images of periodic 
patterns. We hypothesize that symmetric tiles form good candidates for human 
and machine periodic pattern perception. 

More importantly, an understanding of the potential symmetry group tran- 
sitions of a periodic pattern undergoing aflhne transformation opens a door for 
us to apply this method to new problems, such as texture perception and re- 
placement, localization, robot navigation, and human perceptual organization, 
among others. 
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Abstract. A new point of view for wavelet filters is presented. This 
leads to a description of wavelet filters in terms of certain linear inde- 
pendent basic filters which can be designed to construct wavelets with 
special properties. Furthermore, it is shown, that this approach makes 
explicit closed form descriptions for higher order Daubechies wavelet 
filters (at least for Dg and Dw) possible, which were unaccessible be- 
fore. Additionally, some biorthogonal examples are discussed and finally, 
a conceptual generalization to the twodimensional case is given. 



1 Introduction 

Since its introduction in the early 1980s, the evolution of wavelet analysis caused 
a deep impact in nearly all tasks of signal processing as well as computer vision 
applications and related questions (e.g. image compression, feature detection, 
optic flow estimation, treatment of PDEs). Though, the onedimensional theory 
has grown rapidly in the last two decades, there are several open questions con- 
cerning the general, multidimensional wavelet theory, for example the lack of 
factorization theorems like the FEJER-RlESZ-Lemma, which makes the design of 
scaling (and wavelet) filters with desirable properties in more than one dimen- 
sion quite tricky. The aim of this paper is the presentation of a framework for 
onedimensional scaling filter design, which can be easily generalized to higher 
dimensions and may therefore help to overcome some of the existing problems. 
The reason for this is the fact, that a direct design method is used, which is 
independent of factorization questions. Finally, we shall mention that similar 
results were presented in the article [AHC93], which the author was unaware 
of during the first writing of this text. However, in [AHC93] the concept of 
linear independence was not used and no multidimensional generalization was 
intended; additionally, the closed form descriptions for higher order maximally 
flat orthogonal wavelet filters are a new contribution (although, they are mostly 
of theoretical interest). 

* The author is supported by the Deutsche Forschungsgemeinschaft (DFG) 
within the Graduiertenkolleg 357. 
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2 Basic Material 

All of the following investigations are restricted to wavelets that come from a 
dyadic multiresolution analysis (however, the generalization to arbitrary integer 
dilations is straightforward). Consider a scaling function £ I/ 2 (K) and suppose, 
that the related scaling filter symbol mo(uj) is given by 

n 

mo{uj) = '^Ik- , ak€M.. 

k=0 

To yield an orthonormal basis for 1/2 (K)? the symbol has to satisfy the orthogo- 
nality criterion 

1 = |mo(w)p + |mo(w + 7r)p 

n — 1 

n ~T~ n—2k 

= 2 • XI ^ £ X • CO® 

k=0 k=l j=0 

From this, we can directly derive the following (n + 1) /2 constraint equations of 
order two: 

n n—2k 

X 7j = 1 and X 7j ' lj+ 2 k =0, A: = 1, . . . , (n - l)/2. (1) 

i=o i=o 

For most applications, wavelets with a sufficient high regularity and a number of 
vanishing moments (this gives polynomial reproducibility) are desirable. More- 
over, it is a well known fact, that both of the mentioned properties are in some 
sense connected to the zero order, say m, of the scaling filter symbol mo(uj) at the 
aliasing frequency uj = tt. These zeros can be characterized by the Strang-Fix 
conditions or sum rules of order m — 1, that is 

n 

X(~l)* hk = 0 for / = 0, 1, . . . ,m — 1. 
k=0 

From the constraints in (1) one easily shows that an orthogonal scaling filter 
symbol of length n -I- 1 can have at most a zero of order (n -I- 1)/2 at tt. 

3 The Framework 

The main idea of our framework is the following: consider a linear combination 
of linear independent (in vectorial sense) basic filters and solve the equation 
system (1) for the coefficients of the linear combination; from this point of view 
the linear independence means that no redundancy can occur and all solutions 
(if they exist) are accessible. In this section we will now successively build such 
families of linear independent basic filters. These will additionally be chosen such 
that they satisfy the Strang-Fix conditions up to a certain order. 
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Lemma 1. Suppose, the filter [ 70 7 i • • • 7 n ] satisfies the sum rules exactly up 
to order n — 1. Then the filter 

[ 7o 7i • • • 7n+i ] := [ 1 1 ] * [ 7o 7o 7i • • • 7n ] 

satisfies the sum rules exactly up to order n. 

Proof. The proof is straightforward. Just evaluate 

n+1 p—1 X \ n 

k=0 j=0 k=0 

The inner sum on the right side vanishes for p = 0, 1 . . . n and is different from 
zero for p = n + 1 by the assumptions that were made. □ 

Corollary 1. For every n G N* the filter 

/i" := [ 1 1 ]*" 

satisfies the sum rules up to order n — 1, where 7 *” denotes the n times subse- 
quently repeated discrete convolution of 7 . 

Lemma 2. Define 

gl,m _ [ ^ 1 _ 

Then, for all m,l G N* the filter satisfies exactly the sum rules of order 
m — 1 . 



The filters /i” and the convolution filters < 7 ^™ with I -\- m = n form a linear 
independent family of basic filters and will be very useful in the design of several 
scaling filters, as we shall see in the following. 



Proposition 1. Let A„ he the (n + 1) x (n + 1) matrix 



^n. — 






n,0 



K 9o " ‘ 9o " ~ 9o 

un ^2,n-2 n,0 

Ql 9l ’ * ’ 9l 






then 



n(n-|-l) 

detA„ = (— 2 ) ^ 

Especially, det A„ 7 ^ 0 for all n G N* and from this, we directly deduce that the 
n +1 filters 



{h^,9 



l,n— 1 ^2,n— 2 
'1 9 'I 
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form a linear independent family of basic filters and moreover, every subfamily 

additionally satisfies all sum rules up to order A: — 1 by Corollary 1 and Lemma 2. 
Furthermore, this family of basic filters yields a natural decomposition of every 
scaling filter that satisfies the sum rules up to order A: — 1 into an even (symmet- 
ric) and an odd (antisymmetric) part since /i” is always even and jg even 

for j even and odd for j odd. Note further that the sum rule order decreases by 
one with each symmetry switch. 

Proof of Proposition 1. We will make use of the induction principle. For n = 1 
we obtain Ai = [} _j] and det Ai = —2. In the second step, we will evaluate 
det A„_|_i from det A„ by elementary matrix operations. In particular, we acquire 

det A„+i = (— 2)”“*“^ • det A„. 

By induction, we obtain the desired relation. □ 



4 Examples 



Maximally flat Alters. We will start with an example, that leads to the classical 
Dg filter, i.e. an orthogonal filter with a zero of fourth order at the aliasing 
frequency oj = tt. Therefore, consider a combination of linear independent basic 
filters of length eight, that satisfy the Strang-Fix conditions up to order three. 
By Proposition 1 such a filter is given by 

7 = Xo-h^ + + A2-5"’® + 



Solving (1) for this filter yields the solution 



128 

^^ = 3^- 



21-t3p-42p-i 



-t ^J42-S|l + 42p-i -I- 18^/T05 ■ (7 + n - 14p-i)“i/2 



A,=64.Af-^ 



As = 



128 ’ 



where we used the abbreviation p = ^154 -I- 42-\/l5. Thus we found a closed 
form description for a Daubechies filter of length eight, which was impossible 
using other filter design methods. In the same manner we can also find an explicit 
analytical form for Dio. For bigger filters, the complexity increases too much and 
permits explicit forms. However, if one considers a filter 



7 = Ao • h" -I- Ai • 5 



l,n— 1 



I \ ^2,n— 2 I \ ^(n— l)/2,(nH-l)/2 

+ A2 • 5 ’ ... -I- A(„_i)/2 • g'^ 
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of arbitrary even length and solves the system (1), one can at least verify that 



Ao = 2-” 

Aa = 2”-^ • A? - 




We additionally remark, that the solutions for A3 , . . . , A(„_i)/2-i can all be writ- 
ten as rational functions in Ai , while Ai itself is a root of a polynomial of degree 

2(”“3)/2 £qj. > 7_ 

Biorthogonal wavelets. Our framework can also be used to design biorthog- 
onal filters, which have some advantages over orthogonal filters in special appli- 
cations (e.g. symmetry for image compression). We differ between two cases of 
biorthogonal filters: if a primal filter is given, dual filters can always be found 
by solving a system of linear equations; this is the easy case and not considered 
here (since this solutions can be obtained by several other design methods). On 
the other hand, one can take two linear combinations of even basic filters and 
solve their coefficients for the biorthogonality constraints 

mo(u}) ■ mo(w) + mo(u} + tt) ■ nio(co -F tt) = 1, too(O) = mo(0) = 1, 

which again leads to a quadratic equation system. For example, considering a 
symmetric pair of length 9 and 7 and imposing the maximal number of sum rules 
on these filters, in particular, we take 

7 = Ao • /i® -F Ai • + X 2 • 5"^’® and 7 = /jq ■ h’^ + /Ji ■ g^’^, 

we obtain the classical and till today widely used 9/7 image compression filter, 
that was first presented in [ABMD92]. Estimating the joint spectral radius of 
the associated linear operators (To)^-;^ = ^ 2 j-k-i and (Ti)^-^ = ^ 2 j-k reduced 
to a certain invariant subspace E, we obtain the smoothness values a 1.068 
and a 1.701 in terms of the Holder exponent (these techniques are discussed 
in detail in[DL92] and [Gri96]). To obtain better smoothness results, one could 
give up one zero order of 7 (by adding /X2 • and use this degree of freedom 
to find better filters. Another wish could be the property, that the coefficients 
are rationals (as in the easy case), because this can reduce the computational 
amount of the wavelet transform. In order to achieve these requirements, we use 
a numerical heuristic that approximates 

min {m^ I A G Spectrum (^To|£;(/Z2),Ti|£;(/z2),To|^(/Z2),Ti|s(/^2) 
at dyadic rational values jj, 2 . Thereby one finds 
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and the dual filter 

~ _ r 

' “ [ 64 32 64 16 64 32 64 J ' 

This new pair of filters is indeed very promising in applications such as image 
compression. Its smoothness values are a 1.48409 and a 1.67807, respec- 
tively. Note that a is only minimally worse than in the classical 9/7 case, while 
the value for a is significantly better and additionally, all the filter coefficients are 
rational. Finally, we shall mention that this heuristic method does not guarantee, 
that there exist no better solutions than the given one. 

5 The 2D Case 

In the same manner one can build twodimensional filters from linear combina- 
tions of basic filters. The main ideas of this conceptual generalization will be 
described in this section. First, we will state a similar result to Lemma 1. In 
that case, repeated convolutions with the sequences [11] and [1 — 1 ] were 
used to successively build longer filters with a higher sum rule order and it turns 
out that a similar thing can be done in higher dimensions. 

Lemma 3. Suppose, the twodimensional filter [ {'yjk\jei...n,,,kei...ny ] satisfies 
the (twodimensional) sum rules up to order m — 1. Then the filter 

a 

[ Ijk ] * 0 2a+2/3 0 

a 

satisfies the sum rules at least up to order m, if a, ^0. 

The proof is similar to the onedimensional case and omitted. However, these 
filters will not be sufficient; we additionally need some antisymmetric filters, 
which we will get from the following Lemma. 

Lemma 4. Suppose, the symmetric filter \_{'yjk}jei...n,,,kei...ny ] satisfies the 
sum rules up to order m — 1. Then for a / 0 both of the antisymmetric filters 

— a 

a 0 —a 
a 

fulfill the sum rules at least up to order m. 

Taking a little care about some possible redundancies (since specific choices of a 
and fi may lead to linear dependent filters) while using these Lemmata, the linear 
independence of the basic filters directly carries over to the twodimensional case. 
The thing that makes everything more difficult is the orthogonality constraint, 
which now becomes 

\mo(uJx,(^y)\^ + \mo(uJx +TT,UJy + Tr)\^ = 1. 
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Assuming that the filter is rhombic shaped (this is in some sense the most con- 
venient and applicable case) and consists of v double diagonals each of length 
H, this leads us to 2vji — p — jj + 1 equations of the form 

EE ^j—k,j+k-l "f ^j-k+l,j+k—l 
k=l j=l 

V n—1 

EE 1'j-k,j+k-l • ^j-k+T,j+k+(7-l + ^j-k+l,j+k-l ' ^j-k+T+l,j+k+a-l — 0 
k=l j=l 

with (r, a) G {[-v -F 1, -z/ -F 2, . . . , i/ - 1] x [0, 1, . . . , p - 1]} \ {(0, 0)}. 

Example. We will now give an example of a twodimensional orthogonal scaling 
filter, that satisfies the Strang-Fix conditions up to order one. Therefore, we 
take the simple Haar scaling filters 

[11] and [1 — 1 ] 

and apply the Lemmata 3 and 4 to them. This gives us a linear combination 



1 1 




■ -1 -1 




r 1 -1 1 




-1 -1 


15 5 1 
1 1 


+ Ai • 


1111 
-1 -1 


+ A2 • 


1 1 
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+ A3 • 


- 1-1 11 

1 1 



of four basic filters. Solving the orthogonality criterion for the coefficients A*, we 
obtain 



Ao — 



16’ 




±a/3 ±V3 

A2 — and A3 — — - — , 

16 8 



which reproduces the Kovacevic-Vetterli scaling filter (see [KV92]), the first 
known orthogonal 2D-filter that leads to a continouus wavelet for the important 
quincunx sampling grid. Note, that we again found a decomposition of the filter 
into an even and an odd part, where the even part satisfies the sum rules up 
to order two and the odd part up to order one — everything is very similar to 
the ID case. We only need more basic filters because more constraints are to be 
considered. We should remark, that since p = 1/ = 2 in the previous example, we 
would have to satisfy five constraint equations and thus we should use five basic 
filters instead of four, but it turns out that one of the coefficients always gets 
zero. For filters that satisfy the sum rules up to a higher order (e.g. for order 
two, one has to choose at least v >i and p > 4 or vice versa), the orthogonality 
constraints seem to be solvable only numerically, because of the rapidly increas- 
ing complexity of the related nonlinear equation system. 



Finally, we shall remark, that the presented framework could also be used to 
design twodimensional biorthogonal filters. But due to the symmetry properties 
of these, the McClellan transform can be used to derive 2D-filters directly from 
their I'D- prototypes, which is much faster to implement. Thus, the direct usage 
of basic filters seems to make less sense if one is interested in twodimensional 
biorthogonal filters. 
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6 Discussion and Conclusion 

A framework for the design of wavelet filters was presented, which can be gen- 
eralized to higher dimensions. There are very few different approaches to direct 
multidimensional orthogonal filter design. The most important among these is 
the paraunitary polyphase decomposition due to Vaidyanathan ([VH88]). But 
since his building matrices do not commute in general, the a priori ordering 
of these matrices is not clear and thus there is no unique representation of all 
possible orthogonal filters of a given shape, which can be obtained by the pro- 
posed method. However, numerical experiments lead to the conjecture, that both 
methods yield the same filter families. It is intended to apply the multidimen- 
sional wavelets, that stem from these approaches to optic flow estimations and 
to image feature detection within the scope of the authors further research. The 
presented variations for ID biorthogonal wavelets and their 2D counterparts 
(built via McClellan transform) seem to have nice properties for image com- 
pression and some cooperation with researchers from this area is planned. 

Acknowledgements. The author would like to thank G. Sommer, B. En- 
GELKE and S. VUKOVAC. 
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Abstract. Linear statistical models of shape variability of identihable 
point sets have previously been described and applied successfully to 
the empirical modeling of appearance variability in natural images. One 
of the limitations of these linear models has been demonstrated in the 
nonlinear “bending” shape variability of point sets where a length ratio 
is constant. 

We point out that modeling point set variability with groups of transfor- 
mations generated by linear vector fields constitute an algebraic frame 
for modeling simple nonlinear point set variability suitable for the model- 
ing of shape variability. As an example, the very simple “bending” shape 
variability of three points in the complex plane is in this way generated 
by a linear vector field described by a complex 3x3 matrix. 

Keywords: Point Sets, Nonlinear Variability, Shape Space, Lie Groups, 
Linear Vector Fields. 



1 Introduction 

Shape is often defined as whatever is left when position, size and orientation are 
ignored [1,2]. Abstractly the set of shapes can be defined as a set of equivalence 
classes, where the equivalence is “equal except for translation, scaling and ro- 
tation” . Often these equivalence classes are modeled by a choice of a canonical 
element from each class. 

When one considers a finite set of uniquely identifiable points in the plane, 
space or generally R™, an explicit assumption of invariance of variability un- 
der centroid translation, scaling and rotation leads to the statistical theory of 
shape as introduced by David G. Kendall in 1977 [2]. In this theory the space of 
shapes is modeled as a Riemannian manifold. For an in-depth coverage please 
consult one of the two books “The statistical Theory of shape” [3] by Christo- 
pher G. Small and “Statistical Shape Analysis” [4] by Ian L. Dryden and Kanti 
V. Mardia. 

G. Sommer and Y. Y. Zeevi (Eds.): AFPAC 2000, LNCS 1888, pp. 259—268, 2000. 

(c) Springer- Verlag Berlin Heidelberg 2000 
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To keep things simple only point sets in the plane will be considered. Though 
this restriction avoids the non-trivial generalizations of rotations and the compli- 
cations of singular point set configurations invariant under non-trivial subgroups 
of rotations [3, p. 84], it is not essential for the modeling by 1-parameter groups 
of transformations.^ 

The set of n-point sets in the plane (R^)" is conveniently modeled by the 
complex vector space C", since the rotation of an n-point set is then simply 
given by multiplication with a unit complex number. This identification is only 
introduced to get a short and convenient notation, and we will still consider C" 
as a 2n-dimensional real vector space. 



1.1 Kendall’s Manifold of Shapes of Point Sets in the Plane 

To fix terminology and notation we shortly review Kendall’s shape space of 
n-point sets in the plane. First position normalization is done by orthogonal 
projection onto the subspace of “centered point sets” = {(zi, . . . , z„) G 

= 0} with real dimension 2n — 2. Size normalization is done by scaling 
onto the manifold of “pre-shapes” = {po G CQ^^^jjlpolb = 1}-^ is 

thus a sphere of real dimension 2n — 3 in the linear subspace of C". 

Now the manifold of shapes of n-point sets in the plane can be identified 
with the complex projective space CP "“^ = {{zpq\z G C}|po G \ {0}}. 

Kendall’s ’’Shape Space” is a Riemannian manifold with the Procrustes 
metric: 

n 

d{E{p),S{q)) = cos~^{\'^Pkq*k\) , p,qeS^'^~^, ( 1 ) 

where E{p) = {{zp\z G C}} G is the equivalence class representing the shape 
of p [3, p. 13]. In the following we shall simply refer to as Co without 

explicitly noting the dimension. 



1.2 Linear Point Set Models of Local Shape Variation 

In “Active Shape Models” [5] Cootes et al. use a linear model of local point set 
variation. The example point sets of which they model the variation have all 
been translated, scaled and rotated to match as well as possible^ a mean point 
set, which has been given a standard position, size and orientation and is found 
iteratively. They use a local linear model of the variation of the matched point 
sets: 

p = p+Pb, (2) 

^ We have not yet analyzed our compatibility requirement of commutativity with the 
group of centroid centered 3D-rotations, but expect that it is possible to find non- 
trivial 1-parameter groups generated by linear vector helds commuting with it. 

^ When all points are coincident this is not possible. Thus these point sets are excluded. 
^ They use a non-trivial weighting of the distances of corresponding points. This is 
not essential for the point made here. 
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where p € (R^)" is a point set and P is a matrix, which consists of the eigenvec- 
tors of the covariance matrix describing the example point sets deviations from 
the mean point set p. The variable & is a column vector of parameters describing 
the point set deviation from the mean p. It is a linear model of local shape vari- 
ation in the sense that the point set deviation from the mean depends linearly 
on the parameters b. 

They give an example of shape variability where this local linear model is 
inappropriate. The example is point sets along the outline of worms, and the 
problem is that they can bend at the middle. This bending is intuitively a 1- 
dimensional shape variability, so even though translation, scaling and rotation 
have been explicitly removed, there is one significant variation left. 

The problem is that it is not possible to model the bending exactly by a 
1-dimensional local linear model. When the bending is large, 2 eigenvectors are 
needed to span the example point set configurations. This problem of bending 
can be studied in the very simple setting of only three points in the plane allowed 
to vary freely under the constraint that a length ratio is preserved. 

This is an inherent problem of modeling the variation of point sets on the unit 
sphere of pre-shapes This sphere is curved and thus any large variation 

will be of a nonlinear nature. To overcome this difficulty Kent [6] uses a tangent 
space approximation to the pre-shape manifold. 

However it is still possible to think of a 1-dimensional shape variability which 
after projection on a tangent space has a nonlinear image. In the section on 
the bending of three points in the plane we shall see an example where the 
preserved length ratio is different from 1 and where the projected shape variation 
is nonlinear. 



1.3 This Article 

We have thus been inspired to study differential geometric modeling of non- 
linear point set and shape variability. The theory of Lie groups and their Lie 
algebras provide a framework for modeling nonlinear continuous 1-dimensional 
modes of variation (variability) by 1-parameter groups of transformations. These 
1-parameter groups are generated by vector fields describing the modeled vari- 
ability. The linear point set models are in this framework generated by the 
constant vector fields. However, in the context of modeling shape variability the 
requirement of commutativity with scaling naturally leads to the study of point 
set variability generated by linear vector fields. 

This article not only deals with the subject of modeling nonlinear point set 
variability, but also provides an analysis of the modeling of point set variability 
in the context of the inherently nonlinear shape space of point sets. 

2 Variability Modeled by Group of Transformations 

A 1-dimensional point set variability may be modeled by a 1-parameter group 
{Tt)teR of transformations of (R^)". That is, it is assumed that all variations 
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originates from the same variability and are additively parameterized: 



Tt oTs — Tt+s- (3) 

Thus, a variation corresponding to a finite change is modeled by a transfor- 
mation, while a variability is given by a continuous family of transformations 
corresponding to the continuum of degrees of variation describing a continuous 
course of change [7]. 

In this frame the linear model is written as a 1-parameter group (Tt)teR of 
translation transformations of (R^)" 

p = Tt{p) = p + tAp, (4) 

where Ap G (R^)" specifies the direction of the linear variability. 

Higher dimensional variabilities may be modeled by independent 1- 
dimensional variabilities.'^ Two 1-parameter groups and (Ks)sGfl &re 

capable of describing independent variabilities if they commute: 



yt e R,ys G R : Xt oYs = Ys O Xt- 



( 5 ) 



2.1 Continuous Variability Generated by a Vector Field 

Just as the linear variability above was described by Ap G (R^)", a nonlinear 
variability may intuitively be described by a vector field on (R^)", which every- 
where points in the direction of change under the studied variability. Consider 
the integral curves ( 7 p)pg(ij 2 )»i defined by the action of a 1-parameter group 
{Tt)teR- 

lp(t) = Tt(p) , t G R,p G (R^)". (6) 

The derivative of these curves define a vector field X : (R^)" — > (R^)" satisfying 

7;(t)=V(7p(t)), VtGi?. (7) 

On the other hand, such a vector field uniquely identifies a 1-parameter group 
[8, p. 37]. In this way the vector field X is said to generate a 1-parameter group 
of transformations {Xt)teRi where the parameter indicates how far along the 
variability the point set should be transformed. 

We now observe that when logarithmically parameterized the group of point 
set scalings o^{p) = e®p is generated by the linear vector field given by the 
identity transformation I of (R^)": 

Q 

-^\s=o'^{p) = P = !{'^{p))- (8) 

^ For simplicity we have here excluded variabilities with other topologies and inherent 
non-commutativity. Such a variability can be modeled by a non-commutative Lie 
group of dimension higher than 1. 
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Similarly using complex notation it is seen that rotation pg{p) = is generated 
by the linear vector field given by il: 



-^\e=Qpg{p) = ip= U{pq{p)). 



( 9 ) 



It is well known that point set centroid translations T^{p) = {pi + a, . . . ,pn + a) 
do not commute with scaling and rotation.^ We observe that this holds in general 
for the (non-trivial) linear variabilities generated by the constant vector fields. 



2.2 Well Defined Shape Variability Generated by Linear Vector 
Fields 

In the context of point set shape variability it is natural to consider only point 
set variabilities commuting with the shape similarity group of point set trans- 
lations, scalings, and rotations. Such point set variabilities induce well defined 
shape variabilities. The induced shape transformations are defined by applying 
the corresponding point set transformations on all the point sets in a shape 
equivalence class. Because of commutativity with the shape similarity group the 
resulting point sets will all belong to one and the same shape equivalence class, 
thus providing a well defined shape transformation. 

The problem of non-commutativity between the linear variabilities and scal- 
ing and the wish to model nonlinear bending naturally lead to higher order 
modeling of the generating vector fields. As a first simple step towards the gen- 
eral case it is natural to consider 1. order vector fields. 

Since commutativity between 1-parameter groups corresponds to commuta- 
tivity of the generating vector fields® [9, lemma 13 p. 5-35] and the Lie bracket 
between linear vector fields X, Y is given by [A, V] = A o V — V o A [8, p. 87], 
it is seen that the requirement of commutation with scaling (generated by the 
identity vector field I) is automatically fulfilled when considering real linear vec- 
tor fields on (R^)”. In order to secure commutation with centroid translations 
(ja)aeR^ because we only want variabilities which do not change the centroid 
position^ we only consider real linear vector fields on the subspace of centered 
point sets Co and extended to zero on the orthogonal subspace Cjj- modeling the 
centroid position. 

It remains to analyze the requirement of commutativity with point set rota- 
tion. Not all real linear vector fields on Cg C (R^)" commute with the vector 
field generating point set rotation which is most easily expressed as il using the 

® One may without changing the defined shape space choose to consider centroid 
centered scaling and rotation which do commute with centroid translation. These 
centered scalings and rotations are generated by the linear vector fields given by the 
identity Icg on the subspace of centered point sets and ilcg [7]. 

® Two vector fields commute when their Lie bracket is zero. 

^ This is an orthogonality constraint stating that the vector field should be everywhere 
orthogonal to the two constant vector fields and {i, . . . ,i) generating cen- 

troid translations. 
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complex representation (R^)" = C". From this it is seen that the complex linear 
vector fields do commute with rotation. 

To summarize [7]: 

— Commutativity with point set translations - Is obtained by working in the 
subspace of centered point sets. 

— Commutativity with centered scaling - Is obtained by real linearity of the 
vector field on the subspace Co C (R^)" of centered point sets. 

— Commutativity with centered rotation - Is obtained by complex linearity of 
the vector field on the subspace Co C C” of centered point sets. 

The above arguments have only resulted in sufficient conditions for a vector 
field to describe a well defined shape variability. They are not necessary. 

2.3 1-Parameter Groups Generated by Gomplex Linear Vector 
Fields 

Consider the Lie group GL{n, C) of nonsingular nxn complex matrices. This has 
Lie algebra gl{n, C) which consists of all nx n complex matrices. A complex linear 
vector field X : G” i— > G" is represented by a complex nxn matrix A S gl{n, C). 
The corresponding 1-parameter group of transformations (Xt)teR is obtained by 
considering the usual matrix exponential map exp : gl{n, C) — > GL{n, G) [10, p. 
283]: 

^=exp{tX) = I + tX + {l/2\){t2Lf + {l/3\){tXf + ---. (10) 

3 Preserved Length Ratio - Nonlinear “Bending” 

Complex linear vector fields commute with both scaling and rotation, and they 
can describe the nonlinear bending of three points in the plane as a one dimen- 
sional variability. 

The simplest non-trivial example of a shape space is the shape space of three 
points in the plane, Al and is a 2-dimensional Riemannian manifold isometric to 
the sphere in R^ with radius S'^(l/2) [3, p. 70, 73]. It can thus be visualized 
by orthogonal projections on three orthogonal planes. In the following we will 
describe how the nonlinear bending shape variability of three points is generated 
by a complex linear vector field. 



3.1 Centered Bending of 3 Points in the Plane 

Centered bending of 3 points in the plane Pi,P2,P3 & R^ with centroid posi- 
tion Pc € R^ and the constant distances \p\ — P 2 I = and \p2 — P^\ = h can be 
parameterized by the angle 9\ 



-2Zie*® - 

Pi = ^ 'rPc, 

- he-^^ 

P2 = ^ h Pc, 

+ 2^26-*® 



( 11 ) 

( 12 ) 



P 3 = 



3 



+ Pc- 



(13) 
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The above family of point sets can be considered as the result of “centered 
bending” of (pi(0),p2(0),_P3(0)) by the angle 9. But it can also be considered as 
the result of centered bending of {pi{9o) , P2{9o) , Psi&o)) by the angle 0 — 9q. In 
this way a 1-parameter group {Bg)g^n of “centered bending” transformations is 
defined. The vector field B : (R^)^ — > (R^)^, generating the 1-parameter group 
of centered bending of a 3-point set is seen to be given by (^|6»=o)p(^): 



This is seen to be 



— 2/iie“ -I- l2ie 



-id 



= 3 (2^1(0) - 3p2(6') +P3{0)), 



l\ie 



id 



\-l2ie i, 

= 3(-Pi(^') +P3(fi')), 



— 2l2ie 



-id 



= g(-Pi(^) + 3p2(0) - 2p3(9}}. 



a linear vector field, given by the matrix 




2 -3 1 

-1 0 1 
-1 3 -2 



0 -2 



-1 0 

0 1 

-1 0 



0 3 0 -1 
-3010 
0 0 0 -1 
0 0 10 
0-302 
3 0-20 



(14) 

(15) 

(16) 



(17) 



The above equations use first complex and then real notation corresponding to 
the discrimination between and (R^)^ = R®. 

Both the odd and the even columns of S add to 0 S R® in agreement with 
the fact that B defines a vector field which is zero on the subspace modeling 
the position of the point set centroid. Similarly both the even and the odd rows 
add to 0 G R®, proving that this vector field is orthogonal to the directions for 
translating the centroid. The vector field B thus commutes with and is orthogo- 
nal to the vector fields for translation in the x and y directions. Since B is linear 
and represented by a complex matrix, it also commutes with the vector fields 
generating scaling and rotation. 

As an indirect illustration of the vector field represented by B, the generated 
1-parameter group (Bg)g^f{ has been evaluated for a few different angles using 
the matrix exponential. These have been applied to two different 3-point con- 
figurations (see Figure 1). Since the linear vector field B commutes with point 
set translation, scaling and rotation, it generates a 1-parameter group of point 
set transformations inducing a well defined shape variability by acting on the 
point sets in the equivalence class representing the shape. This shape variability 
is illustrated in Figure 2. The figure shows orthogonal projections of S'^(l/2) on 
three orthogonal planes. The two coordinate axes in the projection plane and the 
axis coming up from the paper have been illustrated with a small correspond- 
ing point set configuration. In the left column a close sampling of the shape 
of .Be(-l,l, v^) = exp(6»B)(-l,l,v^) € = (R^)^, for 9 = 0, tt/60, . . . , tt 

has been marked by a “-I-”. In the right column 7 samples (0 = 0, tt/6, . . . , tt) 
along the bending have been marked by small point-set configurations. The first 
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Fig. 1. Centered bending of 3-point set in the plane. The figure shows the 
point sets 0, (1 -h v^)/2) = exp(6»B)(-l,0, (1 -h v^)/2) G = (R2)3, 

and 1, -\/3)) for 0 = 0, 7r/12, . . . , tt/2 



and the last of these 7 shapes are the same, but the point set figures have been 
rotated 180 degrees relatively to each other. 

The starting point set configuration has been chosen so that the bending 
passes through the point (x,y,z) = (0, 1/2,0). This is most easily seen in the 
bottom left projection, which can be considered a projection on Kent’s tangent 
subspace approximation to at the shape of (0, 1/2,0). It is thus seen that 
the bending shape variability of a shape with a preserved ratio of 2/(1 — -\/3) 
does not have a linear image in this subspace. You may also note that the non- 
equidistant spacing of the bended shapes is not in harmony with the Procrustes 
distance, but corresponds to the choice of additive parameterization. 

4 Conclusions 

We have described how Lie groups provide a framework for modeling general 
variability. The classical linear models of point set variability are in this frame- 
work modeled by the Lie groups generated by the constant (0-order) vector 
fields. 

We have found that in the context of shape variability, the natural constraint 
of commutativity with the similarity group provides an inherent need for non- 
linear modeling of point set variability. By considering Lie groups generated by 
l-order vector fields, we are able to model: 

— Nonlinear point set variabilities commuting with scaling and rotation, thus 
inducing well defined shape variabilities. 

— A larger class of variabilities which include the nonlinear bending of 3 points 
in the plane. 

Future research will focus on methods of inferring these variabilities. 
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Abstract. We characterise aspects of our worlds (great and small) in 
formalisms that exhibit symmetry; indeed symmetry is seen as a fun- 
damental aspect of any physical theory. These symmetries necessarily 
have an impact on the way systems exhibit reactive behaviour in a given 
world for a symmetry determines an equivalence between states making 
it appropriate for an reactive system to respond identically to equivalent 
states. We develop the concept of a General Transfer Function (GTF) 
considered as a building block for reactive systems, define the concept of 
full symmetry operator acting on a GTF, and show how such symmetries 
induce a quotient structure which simplifies the process of building an 
invertible domain model for control. 



1 Introduction 

This paper explores the relationship between symmetries of the world and sym- 
metries of the (generalised) transfer functions which are used to characterise the 
response of a reactive system. It is written from the perspective of Artificial 
Intelligence to the extent that we consider how some principles of automating 
problem reductions which might in many cases be “obvious” to humans. 

A symmetry of a physical theory is an invertible mapping of the space in 
which the theory is expressed to itself under which the theory is invariant. For 
example, the symmetries of Newtonian mechanics are drawn from the Galilean 
group of symmetries of space. This consists of translations and rotations and 
uniform translatory motion (but not of rotatory motion) . 

In characterising a reactive system it is adequate to take a special case of a 
physical theory — for example we characterise the gravitational field as uniform 
rather than using the full Newtonian formulation of gravitation. However, a 
reactive (or adaptive) system does not sense the world directly, but only through 
its sensors so that taking such a limited view has advantage for the explanation 
of the behaviour of adaptive biological systems: the world is modelled at a level 
closer to what may be perceived. In general, sensor space is not isomorphic 

* This work was supported in part by the NSF (GDA-9703217), by AFRL/IFTD 
(F30602-97- 2-0032), and by DARPA/ITO/SDR DABT63-99-1-0022. 

G. Sommer and Y. Y. Zeevi (Eds.): AFPAC 2000, LNCS 1888, pp. 269—283, 2000. 

(c) Springer- Verlag Berlin Heidelberg 2000 
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to world-space. The question then naturally arises, how do world-symmetries 
relate to symmetries of transfer functions which are used to characterise reactive 
systems? In particular, is there a useful concept of the symmetry of a controller 
and of a plant? 

The disposition of matter in space has the effect of reducing the symmetry 
of the world from the point of view of reactive systems existing therein. Never- 
theless, residual symmetry frequently remains and is important for determining 
the possible behaviour of an reactive system. When we say that an object or 
world feature has a symmetry we are in essence identifying an invertible map- 
ping from the object (or feature) to itself under which it is invariant. In the case 
of rigid bodies, all symmetries are members of the Special Euclidean group of 
translations and rotations in 3-space. 

Why should we be interested in symmetric controllers? The advantage lies 
in the possibility of being able to handle discrete event dynamic systems by 
combining a repetoire of controllers. We can regard a controller as establishing 
a property (such as maintaining stability in a gravitational field). If a controller 
establishes just that property and no other, then it may be combined with an- 
other controller which establishes another property (forward locomotion, say). 
Thus a space of behaviours can be spanned by combining elementary controllers. 

But with a given property may be associated world-transformations that 
preserve the property. A controller that maintains stability in a gravitational 
field should be invariant with respect to the symmetries of that field. In the case 
of a linear controller, our concept of input symmetry can be related to that of the 
null-space of the controller. Our group-theoretic formulation has the advantage 
of being a generalisation to systems that may be non-linear, non-differentiable 
or even non-continuous. 

2 Previous Work 

The idea of classifying physical theories in terms of symmetry groups is due to 
Noether[8]. Noether’s Theorem is a very general result which shows that any 
physical theory couched in variational terms necessarily has conservation laws 
related to the symmetries of the space in which the theory is expressed. While 
this has applications to the more exotic groups associated with modern physical 
theories, the historical development of physics can also be seen as a progression 
from theories of more restricted symmetry to those of less restricted symmetry. 
Thus Newtonian mechanics, characterised by the Galilean group, provides a more 
symmetric world-view than the Aristotelian. 

In the I980’s one of us, cognisant of the importance of group theory in physics, 
sought to apply it to robotics — specifically to the characterisation of spatial rela- 
tionships between body features established during assembly [12]. Subsequently 
Liu[7] demonstrated the practicality of this approach by developing a computa- 
tionally tractable representation of subgroups of the Euclidean group, providing 
a software implementation thereof together with theoretical justification. Earlier 
Zahnd and Nair [13] had provided a more limited approach. It should be noted 
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that what needs to be represented is not a member of the Euclidean group but 
a subgroup embedded in it. 

In psychology, Michael Leyton [-5] is a pioneer in the use of group theory 
as a basis for understanding perception. His view is that the mind perceives a 
shape in terms of a causal history of how it was formed, so that a deformed 
can is perceived as resulting from the act of denting it. Such deformations are 
not members of the Euclidean Group; indeed they are drawn from a group of 
diffeomorphisms which is much larger than the Euclidean group and therefore 
more challenging to represent computationally. 

By relating the study of shape to the study of symmetry, Leyton is able to 
argue that symmetry is crucial to cognitive processing. Thus perception is seen by 
Leyton as the creation of a causal history which explains the sense-data in terms 
of a process that extends over time — what we would call a general transfer 
function. Thus, in our terms, Leyton sees a primary skill of the human mind 
as being the synthesis of general transfer functions. Crucial to this skill is the 
understanding of the characteristic symmetries of such functions. We perceive a 
pot in terms of the rotational symmetry induced by the potter’s wheel. 

In this paper we shall draw, upon the concept of “naive physics” expounded 
by Hayes [4] in the general sense that it can be desirable to make use of a sim- 
plified physics for understanding the functioning of biological adaptive systems. 
Such simplified physical systems may in general be characterised by having more 
restricted symmetry groups than do the standard models of physics. 

While our formalism does not in general require that mappings be differen- 
tiable, important examples are differentiable and characterisable by differential 
equations. In that case invariance under groups of symmetries is recognised to 
be an important characteristic of a set of equations, see [2]. Our treatment of 
output symmetries is related to the topological concept of homotopy. 

A discussion of the Missionaries and Cannibals problem is found in Amarel[l ]; 
our treatment of the quotient GTE of this problem is closely related to his. 

Over discrete domains, our work has a strong relationship with model check- 
ing. Chapter 14 of [3] entitled “Symmetry” contains a definition of an auto- 
morphism group of a Kripke structure, and develops the concept of a quotient 
structure. This is closely related to our discussion of the symmetries of a GTE. 
One view of our work is that it is an approach building a synthesis of classical 
Control Theory (over a continuous domain) with model checking (over a discrete 
domain) . 

3 Notation: Operators and Groups 

By an operator a we mean an entity drawn from an arbitrary set A (which may 
be infinite). A multiplication is defined on operators. If cti and a 2 are operators, 
then their product is written crifT 2 - 

This product is associative, that is 



(cricr2)0’3 = cri(cT20'3) 
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The identity operator denoted by 1 has the property that lu = cr = crl for 
all operators a. 

If cr is an operator it may have an inverse, written writen a~^ with the 
property that aa~^ = a~^a = 1. 

A set S of operators for which every operator has an inverse, which includes 
the identity operator, and which is closed under multiplication and inverse is 
called a group of operators. 

Given a set U and a set S of operators, we say that the operators of S act 
on U if each operator of S is associated with a mapping^ from U to U. If a is 
an operator, we write the effect of applying a to u G U hy u.a. We require that 

— the identity operator acts as the identity mapping, that is u.l = u. 

— the product of operators acts as the product of the corresponding mappings, 
so that M.(tTi(T 2 ) = (u.CTi).CT 2 - 

— if an operator a has an inverse, then mapping corresponding to cr is in- 

vertable, and the mapping u i— > is the mapping inverse to rt i— > u.ct, 

that is (M.cr“^).tT = u. 

If U is any finite set, then we denote the symmetric group of all permutations 
of the members of U hy . We shall use Su as an operator set. 

4 General Transfer Functions and Their Symmetries 

Transfer functions have long been used by control theorists to characterise the 
behaviour of systems. Essentially, a transfer function characterises the input- 
output relationship of a system that may have internal state (for example the 
charge on the capacitor of an integrator). As such, they are necessarily function- 
als or higher order functions, mapping from a specification of how the input to 
a (sub)system evolves over time to a specification of how its output evolves over 
time. A specification of initial state is also required. 

It should be noted that we regard a Finite State Automaton (FSA) with 
outputs as a generalised transfer function, so that we are not restricting our- 
selves to mappings that are differentiable or even continuous. We may regard 
the evolution of a system over time as discrete or continuous (but not, in our 
current formulation, hybrid). 

^ Operators are related to the concept of a universal algebra. They also resemble the 
methods of object-oriented programming languages. We chose to speak of operators 
rather than work with sets of mappings over our domains for much the same reason 
that object-orientation is used — we can discuss the operators and their properties 
before introducing all the sets they operate on. For example, bilateral symmetry, in 
which a reflection about a central plane is a symmetry operator, is very common. 
Since a reflection of a reflection is the identity operation, the bilateral symmetry 
operator aLR should obey the law = 1. But how it operates on a given domain 
of values is application specific. 

^ Group theorists identify symmetric groups by their isomorphism class, and speak, 
for example, of S 3 as the group of all permutations on 3 elements. This identification 
is inappropriate for our purposes. 
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Definition 4.1 We use X to denote an index-set used to characterise the passage 
of time. X will be the real numbers > 0 (for a continuous system) or the integers 
> 0 (for a discrete system). 

In general, the index set will be totally ordered, with a least element written 
as 0. 



4.1 Domains of Values 

Control systems have traditionally been defined to act on real valued variables 
or vectors thereof. Here we use the concept of a domain of values as a general 
set on which components of a control system may act. In particular, elements 
of domains of values are not necessarily real numbers. We can think of discrete 
domains as supporting the concept of logical sensors and logical behaviours ab- 
stracted from actual sensors and behaviours by software layers. 

We shall generally use U or [/„ to denote a domain of inputs for a given 
transfer function. We may use X or Uout to denote a domain which is an outputs 
of a given transfer function. 

We extend operators over a domain to act on functions over that domain. 
Thus if t/ is a domain, and a acts on U, and u : X ^ U then u.ct is defined by 
(u.cr)(f) = u(f).cr 



4.2 Initialiser Domains 

A general transfer function may have an initial state, which must be specified. 
For example, in a classical control system, integrators may be given an initial 
value determined by the system designer. Likewise a robot may be activated in 
one of many possible initial states. We use P to denote a set of initialiser values, 
referring to P as the initialiser domain. 

In order to support the cascading of generalised transfer functions, initialiser 
values need to be finite sequences, possibly empty, possibly of length one. Cas- 
cading general transfer functions will involve concatenation of their initialiser 
values, which we’ll write as a product piP 2 - For a given initialiser domain P the 
sequences must be all of the same length. 

Definition 4.2 Let U and X be domains which we will call the input domain 
and the output domain respectively. Let P be an initialiser domain. Then a 
general transfer function T is a mapping in T : {{X ^ U) x P) ^ (X ^ X). 

We will use the abbreviation “GTF” for “general transfer function”. The 
idea is that a GTF maps an input function (of time) whose values range over an 
input-space U to an output function (of time) whose values range over an output 
space X. However this mapping may also depend on initialisation determined 
by a member of P. 
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Definition 4.3 A GTF T with preset domain P and output domain X is said 
to be presettable if 



T{u,p){0) =p 

Necessarily P C X; thus the initial output of a presettable GTF can be 
directly specified to be p G AT. 

Definition 4.4 An invertible operator a acting on input-space U is said to be an 
input- symmetry of a transfer-function T if T{u.a,xo) = T{u,xq) for all u G U. 

The idea being captured here is that there may be transformations of the 
input space to which a given transfer function is insensitive. This can amount 
to saying that there is state which is hidden, at least from the given function. 
Whether this is a problem depends on whether what is hidden is relevant. A 
more positive view of input symmetries is that they provide a way of defining 
properties of the input space of a transfer function which may be kept invariant. 

For example, suppose we have a robot with a sideways pointing range sensor. 
Suppose the range sensor is sensing a planar wall. Then the transfer function for 
this sensor (mapping from robot coordinates to a real-valued distance) has an 
input symmetry with respect to translation parallel to the wall, thereby enabling 
wall- following behaviour. 

Definition 4.5 An output- symmetry of a presettable transfer-function T with 
output domain X is an invertable operator a which acts on X and for which 



T{u,xo-(t) = T(u,xo).cr 



5 Examples of GTF’s 

Typically, the GTF that represents the electro-mechanics of a robot will be a 
represented as a presettable transfer function if we regard it as being possible to 
initialise its position. On the other hand, incorporating an output-transducer as 
complex as a digitised TV camera leads to a system that not presettable: we can 
define images, (that is luminance functions defined on an image plane) which 
are not images of any scene realisable in a given world; to define P G X would 
characterise the system as one which could be set up with an exactly determined 
image on the image plane, something we can’t do in practice for we don’t have 
an exact model of the imaging process. 



A Mobile Robot Gonsider, for example, a mobile robot that is placed, initially 
at rest, on an infinite plane. Its input space is the cartesian product Umr = 
7^ X TZ. We shall write {ua,Uc) for a typical member of Umr- Ua is acceleration, 
and Uc is path-curvature, determined by a conventional steering mechanism. Its 
output space is Xmr = TZxTZxTZ. We shall write {x, y, 9) for a typical member 




Symmetries in World Geometry and Adaptive System Behaviour 



275 



of Xmr, where (x, y) is the position of the robot in the plane, and 9 is its 
orientation. All of these are functions of time. 

Its GTF, Tmr, can be characterised by the differential equations 

dx , , „ dy / , . „ 

— = v{t) cos d, — = v{t) sin 9 

(XL (XL 

d9 , dv , . 

For example, if we apply Tmr to the simple input function defined 
by Usimpie{t) = (0.1,0), then 

TMR{us^n,ple, (10, 5, 0))(t) = (O.OSt^ + 10, 5, 0) 

that is uniform acceleration along a straight line through the starting point 
(10,5,0) and parallel to the X-axis. 



The Guards and Prisoners Problem Three guards and three prisoners are 
on the left bank of a river, and need to cross over to the right. A boat is available 
on the left bank. It holds two people. Prisoners must not be allowed to outnumber 
guards. How can the party cross the river?^ 

To express the problem formally^ we will need the following notation: 

~ if s is a finite sequence (or tuple), then we use Si^y to mean that finite 
sequence which differs from s only at index i, where it has the value v. This 
is extended to multiple successive modifications. For example is 

Sj^yj where s' = Si^y. We write () for the empty sequence. 

— There is a set of 2 boolean values {t, f}. 

— There is a set of 3 guards {gl,g2,g3} . 

~ There is a set of 3 prisoners {pl,p2,p3}. 

— An occupant can be EITHER a guard OR a prisoner OR n (indicating that 
a place is unoccupied). The first 3 places on each bank will, if occupied, be 
occupied by a guard. The remaining 3 places will, if occupied, be occupied 
by a prisoner. 

~ A hank is a sextuple of occupants. 

~ A state can be EITHER 

• A triple (c, 6 anfci, 6071 ^ 2 ) where the condition c is a boolean indicating 
whether the boat is on the left bank c = t or the right bank c = f and 
banki and bank 2 each specify the occupants of the left and right banks 
respectively. 

^ Following Amarel, we are deliberately not using a concise representation of the state- 
space, for we wish to discuss how state-space can be contracted by the recognition of 
its symmetries. Our representation is arguably a natural one for a graphical presen- 
tation of the problem which is to be solved by a human interacting with a computer. 
This formalisation was guided by a dehnition of aspects of the problem written in 
the SML language [6]. 
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• OR b indicating a bad state resulting from a physically impossible tran- 
sition such as trying to move the occupant of an empty location. Other 
states are referred to as “good” . 

We denote the set of states® by Uop 

To specify a move, we select one or two “places” , that is indices into of the 
state vector, whose occupants may cross the river to the opposite bank. If any of 
the selected places is unoccupied, the move will be deemed physically impossible. 

Umove = {(*1* e 1 . . .6)} U {{i,j\i,j € I . . .6,i ^ j)} 

We may now define the presettable GTF which characterises the physically 
possible moves of the problem. We’ll call it TmoveOP- To define it, we need a 
Move function, which moves a single occupant from one bank to the other. Here 
Move{i,x) = x' means that the output state x' is obtained from the output- 
state X by moving the occupant o = bi to the other side of the river, where b is 
the appropriate bank, indexed by i. 

There are three main cases 

— Case 1: cc = b In this case Move{i,x) = b. In other words, once a state is 
bad, it remains so. 

— Case 2: X = {t,b,b') where b,b' are banks. So the boat is on the left bank 
since the first component of the state is t. 

Let o= bi . Let o' = b'i There are three sub-cases 

• Case 2.1: o = n . To move a non-existent occupant is physically impos- 
sible, so Move{i,x) = b. 

• Case 2.2: o' yf n. To move an occupant into an occupied location is 
impossible®, so Move{i,x) = b 

• Case 2.3 o yf n Move{i, x) = (f, bi^n, bi^o) 

— Case 3: X = (f, b, b') where b, b' are banks. 

Let 0 = bi . Let o' = b'i There are three sub-cases 

• Case 3.1: o' = n . To move a non-existent occupant is physically impos- 
sible, so Move{i,x) = b. 

• Case 3.2: o yf n. To move an occupant into an occupied location is 
impossible, so PIove{i,x) = b 

• Case 3.3 o' yf n Move{i,x) = (f, 

Now let’s define MoveAll which operates on the members of Umove- 

M oveAll{{) , x) = X, MoveAll{{i, u\ . . . Un, x) 

= Move{i, MoveAll{{ui . . . Un),x)) 

® Good states and bad states are specified by a discriminated union in the SML formu- 
lation. The two cases given correspond to the two cases in the datatype declaration 
in the program 

® With a standard initial condition in which everybody is on one bank it’s not possible 
to move an occupant into an occupied location. 
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We can now define the GTF TmoveO p with index-set the non-negative integers 
as follows: 



7ino«eGp(u, (pi , P 2 , P3 , P4 , Ps, Ps) ) (0) = (t, (pi , p 2 , P3 , P4 , Ps , Pe) (n, n, n, n, n, n)) 

Provided that pi . . .p^ € Guards and p^. . .pe G Prisoners — other- 
wise TmoveGpi'^i, (pi , P 2 , P 3 , P 4 , Ps, Ps) ) (0) = b That is, initially everybody is on 
the left bank. 

For t > 0 let’s suppose that x = TjnoveGp{^,p){t ~ !)• 

TcP{u,p){t) = MoveAll{u{t — 1), x) 

To meet the conditions of the problem we can classify a state as either legal 
1 (so that neither on the left bank nor on the right bank are the guards outnum- 
bered) or illegal i (in which the guards are outnumbered on the left bank or on 
the right bank), or terminal t which is a legal state with everybody on the right 
bank. 

The domain 



Ueval — 

will be the output domain of a GTF which classifies a given situation as 
either legal, illegal or terminal. 

We can also define the GTF T^vaiGP which evaluates a state arising from a 
move in the guards-and-prisoners problem. 

To evaluate TevaiGp{^,p){t), let {c,b,b') = x(t). Then 

- TevaiGp{^,p){t) = t if & = 0 

~ TevaiGp(p^,p){t) = i if OutN umbered{b) = t 

- TevaiGp{^,p){t) = 1 otherwise 

The OutNumbered function applied to a bank b evaluates to t if there are 
guards on b, and they are fewer in number than the prisoners on b. 



Input Symmetries of T^vaiGP Suppose our set of operators S contains the 
symmetric group S Guards , which acts on the space Ug p by permuting the guards 
on each bank. Then SGuards is a group of input symmetries of T^vaiGP- Likewise 
if E contains the symmetric group S prisoners then S prisoners is a group of input 
symmetries of TevaiGP- 

That is to say, if we take any member of UatateG p and permute the prisoners 
and/or the guards, that state will receive the same evaluation under T^vaiGP- 



5.1 Full Symmetries 

We have seen so far symmetries of the inputs and of the outputs of GTF’s. 
However it is frequently the case that a GTF has a symmetry that affects both 
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input and output. For example, any device with a bilateral symmetry will have 
a symmetry operator that, in some sense, interchanges left and right. Applying 
such an operator on the input space (so that left and right are interchanged on 
input) will give rise to an output that has left and right interchanged. A left-right 
interchange on the preset specification will also be needed. 

For example, in the problem of balancing an inverted pendulum, a left-right 
interchange will involve changing the sign of the input command to the motor 
driving the system. Provided the preset (the initial position and velocity of the 
pendulum) is also mapped by a left-right interchange, the output behaviour will 
also exhibit a left-right interchange. 

In our mobile robot example the transformation Uc —Uc is a left-right 
symmetry operator applied to the input. The output-space, which is also the 
preset-space, may be mapped hy y ^ —y, 9 i-^- —9. If we apply these mappings 
to the input-space and the ouput-space the behaviour of the system is described 
by the same transfer function. This is not the only such symmetry operator — 
any reflection operator on the output-space gives rise to a left-right symmetry, 
provided we correctly map 9. 

Definition 5.1 Let T he a GTF. Let u he an invertible operator which acts 
on the input domain Um of T, the output domain Uout of T, and the preset 
domain P ofT. We say that a is a full symmetry ofT if 

T{u.a,p.a) = T{u,p).a 

Proposition 5.2 Let T he a GTF with input domain Um and output do- 
main Uout and preset domain P. Let S be a set of operators acting on these 
domains. Then the set of full symmetry operators on T form a group. 

Proof: Let cti,CT 2 G A be full symmetry operators on T. Then 



T(u.(CTiCT2),p.(criCT2)) = r((u.Cri).(T2, (p.CTi).CT2) 

= r(u.(7i,p.CTi).(72 
= T(u,p).CTi).CT2 
= r(u,p).(CTi(T2) 

Hence ctiCT 2 is a full symmetry operator on T. 

Let CT be a full symmetry operator on T 

T{u.a~^ .a,p.a~^ .a) = T{u.a~^ ,p.a~^).a 

Now consider 

r(u.cr“^.cr,p.CT“^.cr).CT“^ = T{u.a~^ ,p.a~^).a.a~^ 

= T{u.a~^ ,p.a~^).{aa~^) 
= T{u.a~^,p.a~^) 




Symmetries in World Geometry and Adaptive System Behaviour 279 



However 



T(u.cr ^.a,p.a ^.cr).a ^ = T{u.{a ^a),p.{a ^ 

= T{u.l,p.l).a~^ 

= T{u,p).a~^ 



So we’ve shown that 



T(u.cr \p.CT ^)=T(u,p).cr ^ 

Hence a~^ is a full symmetry operator on T. 

We will write QsT for the group of full symmetries of T. 

We can regard input symmetries of a GTF as a special case of full symmetries 
in which the operator acts as the identity operation on the output and preset 
spaces. Likewise we can regard output symmetries of a presettable GTF as a 
special case of full symmetries in which the operator acts as the identity operation 
on the input space. 

In the Prisoners and Guards world, S = Souards U Sprisoners U -S'!! 3} U 
>5'{4...6} so that S includes operators permutating the guards, the prisoners, the 
guards’ places and the prisoners’ places. All three of the operator subgroups 
above naturally map the domains Uqp, Umove and (trivially) Uevai- Each is a 
group of full symmetries of the GTF’s TmoveGP and TgyaiGP- 

6 Cascaded GTF’s 

Definition 6.1 A transfer function T is said to be a transducer if there is a 
function f for which T{u,xo){f) = /(u). 

Thus a transducer (in our sense) is a transfer function whose output depends 
only on the instantaneous value of its input. 

Definition 6.2 The identity transducer is the map defined by 

^(u,0) = u 

Definition 6.3 Let Ti,T2 be general transfer functions. Then the product T1T2 
is defined by 



{TiT2){u,piP2) = Ti(T2(u,P2),Pi) 



Note that the factorisation of a sequence p into P1P2 is unique because the 
sequence-length in a given initialiser domain is fixed. Glearly, the product of 
GTFs is a GTF. 



Proposition 6.4 The product of transfer functions is associative, with the iden- 
tity transducer as its identity. 
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Proof: 

(Ti(T2T3))(u,PiP2P3)) = Ti{{T2T3){u,p2P3),Pl) = Ti (T 2 (T 3 (u, pg) , ^2 ) , Pi ) 
while 



{{TiT2)T3){u,piP2P3)) = {TiT2){T3{u,p3),piP2) = Ti (T 2 (Tg (u, pg) , P 2 ) , Pi ) 

Moreover (JT)(u,p) = I{T{u,p), ()) = T{u,p), (T/)(u,p) = T(/(u, ()),p) = 
T{u,p). 

Proposition 6.5 Let Ti,T2 be GTF’s for which U\ is the input space ofTi, U2 
is the output space ofT\ and the input space 0/T2, while U3 is the output space 
0/T2. Let Si be a group of full symmetries of T\, while S2 is a group of full 
symmetries 0/T2. Then S\ f] S2 is a group of full symmetries 0/T2T1. 

Proof: Let a G Si f] S2- Consider 



(r2Ti)(u.cr, {pip2).cr) = T2{Ti{u.a,pi.a),p2.cr) 
= T2{Ti{u,pi).a,p2.(j) 

= T2{Ti{u,pi),p2).cr 
= {T2Ti){u,pip2).a 



Thus CT is a full symmetry of T1T2. 

For example consider a mobile robot, equipped with a camera, on an infinite 
plane on which a straight line is painted. A symmetry group of the whole system 
is the intersection of the output-symmetries of the mechanics with the input 
symmetries of the camera as it views the line. 



7 Taking the Quotient Simplifies Transfer Functions 

Proposition 7.1 Any group of operators S' defines an equivalence relationship 
on a domain on which it operates. 

u = It' 3(7 € S' , u' = u.a 

Definition 7.2 Let U be a domain, P be a preset domain, S' G S a group 
of operators on U . Then we write U f S' for the set of classes of members of U 
equivalent under S' . Also we write (U x P)jS' for the set of classes of members 
of U X P equivalent under S''^ . 

^ We are extending the operator set to act on the cartesian product in the obvious 
way 
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We write u for the equivalence class corresponding to u The mapping u 
is treated as the operator 1/ S' . 

For finite domains, the advantage of going to quotient domains is that the 
size of the search space required to invert a transfer function for the purpose of 
creating a regulator or an open- loop controller is reduced. In [11] being able to 
take the quotient of the Special Euclidean Group by the group of translations 
proved a useful way of simplifying the robotic assembly problems studied in that 
paper. 

Proposition 7.3 Let S' G S be a group of full symmetries of a transfer func- 
tion T whose input space is U , whose output space is X and whose preset space 
is P. Let 6 = I/S'. Then the function T/S' defined by (T/ S')(u.0,p.9) = 
T{u,p).0 is a GTF. 

Proof: The only thing we have to show is that the mapping is well defined. 
Suppose (u,p) = (u',p'). Then u' = u.a,p' = p.a 



{T/S'){vf.0,p'.e) = T{vl',p').0 = T{vi.a,p.a).0 = T{vi,p).a.0 = T{vi,p).0 



7.1 The Quotient of the Guards and Prisoners Problem 

In taking the quotient domain U move /S' the guards’ places are equivalent and 
so are the prisoners’ places. Thus the possible moves are (I) meaning “1 guard 
crosses”, (1, 1) meaning “2 guards cross”, (1, 4) and (4, 1) meaning “1 guard and 
one prisoner cross”, (4 meaning “one prisoner crosses” and (4,4) meaning “two 
prisoners cross” . Thus the size of the input space is reduced from 36 for TmoveG p 
to 6 for TmoveGp/^'- ® 

Reducing the size of the input domain makes a significant reduction in the 
size of the search-space for a sequence of inputs that will produce the desired 
terminal output t — the fan-out is divided by 6 at each stage. 

Moreover the state-space is also shrunk by taking the quotient. In the original 
formulation there were 2 x (6 x 2^)^ = 4608 states that could be reached from 
possible initial states. These are shrunk down to 3 * 3 * 2 = 18 possible states in 
the quotient domain, for any two states that have the same number of guards 
on the left bank and have the same number of prisoners on the left bank and 
have the boat in the same place will be equivalent under S' . 

8 Summary — Future Work 

In this paper we have developed the concept of a generalised transfer function, 
illustrating how the concept encompasses both discrete and continuous systems. 
We have developed the concept of symmetry of a GTF which, in its most general 

® Further condensation of the input space is possible if we note that permutation of 
the entries in an input-tuple is an input symmetry of TmoveG p 
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form, tells us how modifications of the input (and preset) of a GTF will affect 
its output. Symmetry thus has the potential to play the role of differentiation in 
classical control theory. 

A strong motivation for generalisation is that biological adaptive systems 
appear to operate both in continuous and discrete domains. While there is a 
continuum of configurations of the human body, human communication takes 
place in a discretised vocabulary of words which are used inter alia to discuss 
actions and which arguably may characterise aspects of the mental processes 
underlying actions. 

A potential bridge between continuous and discrete domains is the quotient 
operation, since it supports the collapse of a continuous domain into a discrete 
one, for example on the basis of topological invariants®. 

The value of the generalisation depends on whether it can be used as the basis 
of synthesis and analysis of reactive systems. We might wish to analyse a GTF 
from and extensive or intensive point of view. The extensive (or “white box”) 
approach supposes we have a definition of a GTF that is open for inspection, 
so that its symmetries can be inferred from its definition. The intensive (or 
“black box” ) approach requires the formulation of a characterisation of a GTF 
by observing its behaviour. 

One question that has been left unexplored is “Of what class of inputs is a 
given GTF a symmetry operator?” This is crucial to the understanding of linear 
systems whose analysis depends on the observation that a linear GTF is a scale 
symmetry of the exponential function over the complex domain. 
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Abstract. The paper concerns 2D-3D pose estimation in the algebraic 
language of kinematics. The pose estimation problem is modelled on the 
base of several geometric constraint equations. In that way the projective 
geometric aspect of the topic is only implicitly represented and thus, pose 
estimation is a pure kinematic problem. The dynamic measurements of 
these constraints are either points or lines. The authors propose the 
use of motor algebra to introduce constraint equations, which keep a 
natural distance measurement, the Hesse distance. The motor algebra is 
a degenerate geometric algebra in which line transformations are linear 
ones. The experiments aim to compare the use of different constraints 
and different methods of optimal estimating the pose parameters. 

1 Introduction 

The paper describes the estimation of pose parameters of known rigid objects in 
the framework of kinematics. Pose estimation is a basic visual task. In spite of 
its importance it has been identified for a long time (see e.g. Crimson [5]), and 
although there is published an overwhelming number of papers with respect to 
that topic [9], up to now there is no unique and general solution of the problem. 
Pose estimation means to relate several coordinate frames of measurement data 
and model data by finding out the transformations between, which can subsume 
rotation and translation. Since we assume our measurement data as 2D and 
model data as 3D, we are concerned with a 2D-3D pose estimation problem. 
Camera self-localization and navigation are typical examples of such types of 
problems. The coupling of projective and Euclidean transformations, both with 
nonlinear representations in Euclidean space, is the main reason for the diffi- 
culties to solve the pose problem. In this paper we attend to a pose estimation 
related to estimations of line motion as a problem of kinematics. The problem 
can be linearly represented in motor algebra [8] or dual quaternion algebra [7]. 
Instead of using invariances as an explicit formulation of geometry as often has 
been done in projective geometry, we are using implicit formulations of geom- 
etry as geometric constraints. We will demonstrate that geometric constraints 
are well conditioned, in contrast to invariances. 

The paper is organized as follows. In section two we will introduce the mo- 
tor algebra as representation frame for either geometric entities, geometric con- 
straints, and Euclidean transformations. In section three we introduce the geo- 
metric constraints and their changes in an observation scenario. Section four is 
dedicated to the geometric analysis of these constraints. In section five we show 
some results for constraint based pose estimation with real images. 

2 The motor algebra in the frame of kinematics 

A geometric algebra Qp,q,r is a linear space of dimension n = p -\- q -\- r, 
with a rich subspace structure, called blades, to represent so-called multivectors 
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as higher order algebraic entities in comparison to vectors of a vector space 
as first order entities. A geometric algebra Qp,q,r results in a constructive way 
from a vector space R”, endowed with the signature (p,q,r), n = p + q + r by 
application of a geometric product. The geometric product consists of an outer 
(a) and an inner (•) product, whose role is to increase or to decrease the order 
of the algebraic entities, respectively. 

To make it concretly, a motor algebra is the 8D even algebra ■> derived 
from R^, i.e. n = 4, p = 3, (? = 0, r = 1, with basis vectors 7^,, k = 1 , ...,4, and 
the property 7i = 7I = 73 = +1 and 7I = 0. Because 7I = 0, Gt ,0,1 is called a 
degenerate algebra. The motor algebra G30 1 i® dimension eight and spanned 
by qualitative different subspaces with the following basis multivectors: 
one scalar : 1 

six bivectors : 7273, 737u 7i 72, 747u 7472, 7473 

one pseudoscalar : I = 71727374. 

Because 7I = 0, also the unit pseudoscalar squares to zero, i.e. 7^ = 0. Re- 
membering that the hypercomplex algebra of quaternions IH represents a 4D 
linear space with one scalar and three vector components, it can simply be 
verified that Gtp,i is isomorphic to the algebra of dual quaternions H [11]. 

The geometric product of bivectors A, B G (^^01)2, splits into AB = 
AB + A X B + A A B, where A • B is the inner product, which results in 
a scalar A ■ B = a, A A B is the outer product, which in this case results in 
a pseudoscalar A A B = 1 ( 3 , and A x 71 is the commutator product, which 
results in a bivector C, A x B = ^ (AB — BA) = C. In a general sense, mo- 
tors are called all the entities existing in motor algebra. They are constituted 
by bivectors and scalars. Thus, any geometric entity as points, lines, and planes 
have a motor representation. Changing the sign of the scalar and bivector in the 
real and the dual parts of the motor leads to the following variants of a motor 
M = (cio -l- a) -l- 1 (bo 3 - b) M = (ao — o) -I- I (bo — b) 

M = (ao + a) — I(bo + b) M = (ao — a) — I(bo — b) . 

We will use the term motor in a more restricted sense to call with it a screw 
transformation, that is an Euclidean transformation embedded in motor alge- 
bra. Its constituents are rotation and translation (and dilation in case of non-unit 
motors). In line geometry we represent rotation by a rotation line axis and a ro- 
tation angle. The corresponding entity is called a unit rotor, R, and reads as 
follows 

R = ro + ri7273 + r2737i + r37i72 = cos (|) -I- sin (|) n = exp (|n) . 

Here 6 is the rotation angle and n is the unit orientation vector of the rotation 
axis, spanned by the bivector basis. 

If on the other hand, t = ^17273 -I- ^27371 + ^371 72 is a translation vector in 
bivector representation, it will be represented in motor algebra as the dual part 
of a motor, called translator T with 

T = H-7| =exp(^|7^ 

Thus, a translator is also a special kind of rotor. 

Because rotation and translation concatenate multiplicatively in motor alge- 
bra, a motor M reads 

M = TR = R + i\r = R -I- IR!. 

A motor represents a line transformation as a screw transformation. The line L 
will be transformed to the line L' by means of a rotation Rg around a line Lg 
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by angle B, followed by a translation tg parallel to Lg. Then the screw motion 
equation as motor transformation reads 

L' = TsRsLRsT's = MLM. 

For more detailed introductions see [8] and [10]. Now we will introduce the de- 
scription of the most important geometric entities [8]. 

A point a; G represented in the bivector basis of i.e. X G 

reads X = 1 + 2:17471 -I- 2:27472 + 2:37473 = 1 + Ix. 

A line L G Q^q 1 is represented by i = n -I- Im with the line direction 
n = ni7273-l-n2737i’-l-n37i72 and the moment m = 77x17273 -l-m2737i-l-m37i 72. 

A plane P G ^70,1 be defined by its normal p as bivector and by its 
Hesse distance to the origin, expressed as the scalar d= (x ■ p), in the following 
way, P = p + Id. 

In case of screw motions M = TgRg not only line transformations can 
be modelled, but also point and plane transformations. These are expressed as 
follows. _ _ 

X' = MXM L' = MLM P' = M PM 

We will use in this study only point and line transformations because points 
and lines are the entities of our object models. 

3 Geometric constraints and pose estimation 

First, we make the following assumptions. The model of an object is given by 
points and lines in the 3D space. Furthermore we extract line subspaces or points 
in an image of a calibrated camera and match them with the model of the object. 
The aim is to find the pose of the object from observations of points and lines 
in the images at different poses. Figure 1 shows the scenario with respect to 
observed line subspaces. The method of obtaining the line subspaces is out of 
scope of this paper. Contemporary we simply got line segments by marking 
certain image points by hand. To estimate the pose, it is necessary to relate the 
observed lines in the image to the unknown pose of the object using geometric 
constraints. 

The key idea is that the observed 2D entities together with their correspond- 
ing 3D entities are constraint to lie on other, higher order entities which result 
from the perspective projection. In our considered scenario there are three con- 
straints which are attributed to two classes of constraints: 

1. Collinearity: A 3D point has to lie on a line (projection ray) in the space 

2. Coplanarity: A 3D point or line has to lie on a plane (projection plane). 
With the terms projection ray or projection plane, respectively, we mean the 
image-forming ray which relates a 3D point with the projection center or the in- 
finite set of image-forming rays which relates all 3D points belonging to a 3D line 
with the projection center, respectively. Thus, by introducing these two entities, 
we implicitly represent a perspective projection without necessarily formulating 
it explicitly. The most important consequence of implicitly representing projec- 
tive geometry is that the pose problem is in that framework a pure kinematic 
problem. A similar approach of avoiding perspective projection equations by 
using constraint observations of lines has been proposed in [2]. 

To be more detailed, in the scenario of figure 1 we describe the following 
situation: We assume 3D points A'^ and lines L' Ai of an object model. Further 
we extract line subspaces lai in an image of a calibrated camera and match them 
with the model. 

Three constraints can be depicted: 
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Fig. 1. The scenario. The solid lines at the left hand describe the assumptions: the 
camera model, the model of the object and the initially extracted lines on the image 
plane. The dashed lines at the right hand describe the actual pose of the model, which 
leads to the best fit of the object with the actual extracted lines. 

1. A transformed point, e.g. Ai, of the model point A'^ must lie on the projec- 
tion ray Lq^, given by C and the corresponding image point a\. 

2. A transformed point, e.g. Ai, of the model point A\ must lie on the projec- 
tion plane P 12 , given by C and the corresponding image line / 03 . 

3. A transformed line, e.g. Laz, of the model line L' az must lie on the projec- 
tion plane P 12 , given by C and the the corresponding image line laz- 



constraint 


entities 


dual quaternion algebra 


motor algebra 


point-line 


point X = 1 ->t- Ix 
line L = n->t- Im 


LX - XL = 0 


XL-LX = 0 


point-plane 


point X = \ ->t- Ix 
plane P = p + Id 


PX - XP = 0 


PX - ICP = 0 


line-plane 


line L = n + Im 
plane P = p + Id 


LP - PL = 0 


LP + PL = 0 



Table 1. The geometric constraints expressed in motor algebra and dual quatenion 
algebra, respectively. 

Table 1 gives an overview on the formulations of these constraints in mo- 
tor algebra, taken from Blaschke [4], who used expressions in dual quaternion 
algebra. Here we adopt the terms from section 2. 

The meaning of the constraint equations is immediately clear. In section 4 
we will proceed to analyse them in detail. They represent the ideal situation, 
e.g. achieved as the result of the pose estimation procedure with respect to the 
observation frame. With respect to the previous reference frame, indicated by 
primes, these constraints read 

{MX'M)L - L{MX'M) = 0 
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P{MX'M) - (MX'M)P = 0 
(ML'M)P + P(ML'M) = 0. 

These compact equations subsume the pose estimation problem at hand: find 
the best motor M which satisfies the constraint. We will get a convex optimiza- 
tion problem. Any error measure |e| > 0 of the optimization process as actual 
deviation from the constraint equation can be interpreted as a distance measure 
of misalignment with respect to the ideal situation of table 1. That means e.g. 
that the constraint for a point on a line is almost fulfilled for a point near the 
line. This will be made clear in the following section 4. 

4 Analysis of the constraints 

In this section we will analyse the geometry of the constraints introduced in the 
last section. We want to show that the relations between different entities are 
controlled by their orthogonal distance, the Hesse distance. 

4.1 Point-line constraint 

Evaluating the constraint of a point X = 1 + Ix collinear to a line L = n + Im 
leads to 

Q = XL -LX = {1 + Ix)(n + Im) - (n - Jm)(l -I- Ix) 

= n + Im + Ixn — n + Im — Inx = I(2m + xn — nx) 

= 21 (m — n X x) 

^ 0 = I{m — nxx). 

Since / 7 ^ 0, although 7^ = 0, the aim is to analyze the bivector m — n x x. 
Suppose X ^ L. Then, nonetheless, there exists a decomposition x = x^+ X2 
with Al = (1 + Ixx) £ L and X2 = {I + IX2) T L. Figure 2 shows the scenario. 




Fig. 2. The line L consists of the direction n and the moment m = nxv. Further, there 
exists a decomposition x = X1+X2 with Xi = ( 1 +Ixi) G L and X2 = {I+IX2) T L, 
so that m = n X V = n X xi. 

Then we can calculate 

\\m — n X a;|| = \\m — nx (xi + X2)\\ = \\m — n x x\ — n x X2W 
= II - n X ® 2 || = 11*211. 

Thus, satisfying the point-line constraint means to equate the bi vectors m and 
nxx, respectively making the Hesse distance ||a; 2 || of the point X to the line 
L to zero. 
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4.2 Point-plane constraint 

Evaluating the constraint of a point X = l-\-Ix coplanar to a plane P = p + Id 
leads to 



0 = PX - XP = (p + Id)(l + Ix) - (1 - Ix){p - Id) 

= p + Ipx + Id — p + Id + Ixp = I{2d + px + xp) 

^ 0 = I{d + p • a;). 

Since / 7 ^ 0, although = 0, the aim is to analyze the scalar d + p-x. Suppose 
X ^ P. The value d can be interpreted as a sum so that d = dot + do 2 and doip 
is the orthogonal projection of x onto p. Figure 3 shows the scenario. Then we 




Fig. 3. The value d can be interpreted as a sum d = dot +do 2 so that doip corresponds 
to the orthogonal projection of x onto p. 

can calculate 

d + p ■ X = doi + do2 + p ■ X = doi + p ■ X + do2 = do2. 

The value of the expression d + p-x corresponds to the Hesse distance of the 
point X to the plane P. 

4.3 Line-plane constraint 

Evaluating the constraint of a line L = n + Im coplanar to a plane P =p + Id 
leads to 



0 = LP + PL = (n + Im){p + Id) + (p + Id){n — Im) 

= np + Imp + Ind + pn + Ind — Ipm 
= np + pn + I{2dn — pm + mp) 

^ 0 = n • p + I(dn — px m) 

Thus, the constraint can be partitioned in one constraint on the real part of the 
motor and one constraint on the dual part of the motor. The aim is to analyze 
the scalar n ■ p and the bivector dn — {px m) independently. Suppose L ^ P. 
If n / p the real part leads to 

n- p = — ||n||||p|| cos(a) = — cos(q;), 

where a is the angle between L and P, see figure 4. If n T p, we have n -p = 0. 
Since the direction of the line is independent of the translation of the rigid body 
motion, the constraint on the real part can be used to generate equations with 
the parameters of the rotation as the only unknowns. The constraint on the dual 
part can then be used to determine the unknown translation. In other words, 
since the motor to be estimated, M = R + IRT = R + IR', is determined in 
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its real part only by rotation, the real part of the constraint allows to estimate 
the rotor R, while the dual part of the constraint allows to estimate the rotor 
R'. So it is possible to sequentially separate equations on the unknown rotation 
from equations on the unknown translation without the limitations, known from 
the embedding of the problem in Euclidean space [7]. This is very useful, since 
the two smaller equation systems are easier to solve than one larger equation 
system. To analyse the dual part of the constraint, we interpret the moment m 
of the line representation L = n + Im as m = n x s and choose a vector s 
with 5 = (1 + Is) G L and s T n. By expressing the inner product as the anti- 
commutator product, it can be shown ([1]) that — (p x m) = (s • p)n — (n • p)s. 
Now we can evaluate 

dn — (px m) = dn — (n- p)s + (s ■ p)n. 

Figure 4 shows the scenario. Further, we can find a vector Si || s with 




Fig. 4. The plane P consists of its normal p and the Hesse distance d. Furthermore we 
choose S = (1 -I- Is) G L with s J- n. The angle of n and p is a and the angle of s and 
p is (3. We choose the vector with s || so that dp is the orthogonal projection of 
(s -I- Si) onto p. 

0 = d — (||s|| -I- ||si||) cos(P). The vector si might also be antiparallel to s. This 
leads to a change of the sign, but does not affect the constraint itself. Now we 
can evaluate 

dn — (n- p)s + (s ■ p)n = dn — ||s|| cos(/3)n -I- cos(q;)s = ||si|| cos(/3)n -I- cos(q;)s. 
The error of the dual part consists of the vector s scaled by the angle a and the 
direction n scaled by the norm of si and the angle /?. 

If n T p, then p || s and thus, we will find 

\\dn - (px ■m)|| = \\dn + {s ■ p)n - (n -pjsll = ||(d -I- s -pjnH = |(cH- s • p)|. 
This means, in agreement to the point-plane constraint, that (d + s-p) describes 
the Hesse distance of the line to the plane. This analysis shows that the consid- 
ered constraints are not only qualitative constraints, but also quantitative ones. 
This is very important, since we want to measure the extend of fulfillment of 
these constraints in the case of noisy data. 

5 Experiments 

In this section we present some experiments with real images. We expect that 
both the special constraint and the algorithmic approach of using it may in- 
fluence the results. In our experimental scenario we took a B21 mobile robot 
equipped with a stereo camera head and positioned it two meters in front of a 
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Fig. 5. The scenario of the experiment: In the top row two perspectives of the 3D 
object model are shown. In the second row (left) the calibration is performed and the 
3D object model is projected on the image. Then the camera moved and corresponding 
line segments are extracted. 



calibration cube. We focused one camera on the calibration cube and took an 
image. Then we moved the robot, focused the camera again on the cube and took 
another image. The edge size of the calibration cube is 46 cm and the image size 
is 384 X 288 pixel. Furthermore, we defined on the calibration cube a 3D object 
model. Figure 5 shows the scenario. In the first row two perspective views of the 
3D object model are shown. In the left image of the second row the calibration 
is performed and the 3D object model is projected onto the image. Then the 
camera is moved and corresponding line segments are extracted. To visualize 
the movement, we also projected the 3D object model on its original position. 
The aim is to find the pose of the model and so the motion of the camera. In 
this experiment we actually selected certain points by hand and from these the 
depicted line segments are derived and, by knowing the camera calibration by 
the cube of the first image, the actual projection ray and projection plane pa- 
rameters are computed. In table 2 we show the results of different algorithms 
for pose estimation. In the second column of table 2 EKF denotes the use of an 
extended Kalman filter. The design of the extended Kalman filters is described 
in [6]. MAT denotes matrix algebra, SVD denotes the singular value decompo- 
sition of a matrix to ensure a rotation matrix as a result. In the third column 
the used constraints, point-line (XL), point-plane (XP) and line-plane (LP) are 
indicated. The fourth column shows the results of the estimated rotation matrix 
Tt and the translation vector t, respectively. Since the translation vectors are 
in mm, the results differ at around 2-3 cm. The fifth column shows the error of 
the equation system. Since the error of the equation system describes the Hesse 
distance of the entities, the value of the error is an approximation of the squared 
average distance of the entities. It is easy to see, that the results obtained with 
the different approaches are close to each other, though the implementation leads 
to different algorithms. Furthermore the EKF’s perform more stable than the 
matrix solution approaches. 






292 



Bodo Rosenhahn et al. 




Table 2. The experiment 1 results in different qualities of derived motion parameters, 
depending on the used constraints and algorithms to evaluate their validity. 




Fig. 6. Visualization of some errors. We calculate the motion of the object and project 
the transformed object in the image planes. The extracted line segments are also shown. 
In the first and second row, the results of nos. 5, 3 and nos. 7, 8 of table 2 are visualised 
respectively. 



The visualization of some errors is done in figure 6. We calculated the motion 
of the object and projected the transformed object in the image plane. The 
extracted line segments are overlayed in addition. Figure 6 shows in the first 
row, left the results of nos. 5, 3 and nos. 7, 8 of table 2 respectively. The results 
of no. 7 and 8 are very good, compared with the results of the other algorithms. 

These results are in agrement with the well known behavior of error prop- 
agation in case of matrix based rotation estimation. The EKF performs more 
stable. This is a consequence of the estimator themselves and of the fact that 
in our approach rotation is represented as rotors. The concatenation of rotors is 
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more robust than that of rotation matrices. 

6 Conclusions 

The main contribution of the paper is to formulate 2D-3D pose determination in 
the language of kinematics as a problem of estimating rotation and translation 
from geometric constraint equations. There are three such constraints which re- 
late the model frame to an observation frame. The model data are either points 
or lines. The observation frame is constituted by lines or planes. Any deviations 
from the constraint correspond the Hesse distance of the involved geometric 
entities. From this starting point as a useful algebraic frame for handling line 
motion, the motor algebra has been introduced. This is an eight-dimensional lin- 
ear space with the property of representing rigid movements in a linear manner. 
The use of the motor algebra allows to subsume the pose estimation problem by 
compact equations, since the entities, the transformation of the entities and the 
constraints for collinearity or coplanarity of entities can be described very eco- 
nomically. Furthermore the introduced constraints contain a natural distance 
measurement, the Hesse distance. This is the reason why the geometric con- 
straints are well conditioned (in contrast to invariances) and, thus behave more 
robust in case of noisy data. 
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Abstract. Integral transforms and the signal representations associ- 
ated with them are important tools in applied mathematics and signal 
theory. The Fourier transform and the Laplace transform are certainly 
the best known and most commonly used integral transforms. However, 
the Fourier transform is just one of many ways of signal representation 
and there are many other transforms of interest. In the past 20 years, 
other analytical methods have been proposed and applied, for example, 
wavelet, Walsh, Legendre, Hermite, Gabor, fractional Fourier analysis, 
etc. Regardless of their particular merits they are not as useful as the 
classical Fourier representation that is closely connected to such powerful 
concepts of signal theory as linear and nonlinear convolutions, classical 
and high-order correlations, invariance with respect to shift, ambigu- 
ity and Wigner distributions, etc. To obtain the general properties and 
important tools of the classical Fourier transform for an arbitrary orthog- 
onal transform we associate to it generalized shift operators and develop 
the theory of abstract harmonic analysis of signals and linear and non- 
linear systems that are invariant with respect to these generalized shift 
operators. 



1 Generalized Convolutions and Correlations 

The integral transforms and the signal representation associated with them are 
important concepts in applied mathematics and in signal theory. The Fourier 
transform is certainly the best known of the integral transforms and with the 
Laplace transform also the most useful. Since its introduction by Fourier in 
early 1800s, it has found use in innumerable applications and has, itself, led to 
the development of other transforms. We recall the most important properties 
of the classical Fourier transform: theorems of translation, modulation, scaling, 
convolution, correlation, differentiation, integration, etc. However, the Fourier 
transform is just one of many ways of signal representation, there are many 
other transforms of interest. 

In the past 20 years, other analytical tools have been proposed and applied. 
An important aspect of many of these representations is the possibility to ex- 
tract relevant information from a signal; information that is actually present but 
hidden in its complex representation. But, they are not efficient analysis tools 
compared to the group-theoretical classical Fourier representation, since the lat- 
ter one is based on such useful and powerful tools of signal theory as linear and 

G. Sommer and Y. Y. Zeevi (Eds.): AFPAC 2000, LNCS 1888, pp. 294—308, 2000. 
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nonlinear convolutions, classical and higher-order correlations, invariance with 
respect to shift, ambiguity and Wigner distributions, etc. The other integral 
representations have no such tools. 

The ordinary group shift operators (Tlx){t) := (T^'^x){t) := x{tQr) 

play the leading role in all the properties and tools of the Fourier transform men- 
tioned above. In order to develop for each orthogonal transform a similar wide 
set of tools and properties as the Fourier transform has, we associate with each 
orthogonal transform a family of commutative generalized shift operators. Such 
families form commutative hypergroups. Only in particular cases hypergroups are 
well-known abelian groups. We will show that many well-known harmonic anal- 
ysis theorems extend to the commutative hypergroups associated with arbitrary 
Fourier transforms. 

1.1 Generalized Shift Operators 

Let y = f{t) : fl — > C be a complex-valued signal. Usually fl = R" x T, where 
R" is n-D vector spaces, T is compact (temporal) subset from R. Let fl* be 
the space dual to fl. The first one will be called speetral domain, the second one 
- signal domain, keeping the original notion of t S 12 as ’’time” and to S 12* as 
’’frequency”. Let 

L(12,C) := {f{t)\ f{t) : 12 — > C} and L(12*,C) := {F{cu)\F{uj) : 12* — ^ C} 

be two vector spaces of square-integrable functions. In the following we assume 
that the functions satisfy certain general properties so that pathological cases 
where formulas would not hold are avoided. Let {ipuj{t)} be an orthonormal 
system of functions in L(12, C). Then for any function f{t) G L(12, C) there 
exists such a function F{u>) G L(12*, C), for which the following equations hold: 



F{cv) = F{f}{u;) = . 


1 fit)^u>it)dp.it). 


(1) 


ten 




fit) = J^-\F}{t) = 


/ F{u})(fujit)dfi{uj), 


(2) 



where fi(t),p{io) are certain suitable measures on the signal and spectral do- 
mains, respectively. 

The function F{cu) is called iF-spectrum of a signal f{t) and the expressions 
(l)-(2) are called the pair of abstract Fourier transforms (or J^-transforms) . 
In the following we will use the notation f{t) < — > T'(w) in order to indicate 

iF-transforms pair. 

Along with the ’’time” and ’’frequency” domains we will work with ’’time- 
time” 12 X 12, ’’time-frequency” 12 x 12*, ’’frequency-time” f2* x 12 and 
’’frequency-frequency” 12* x 12*, domains and with four joint distributions, which 
are denoted by double letters ff(t, r) S L2(I2 x 12, C), Ff(w, r) S L2(I2* x f2, C), 
fF{t,v) G L2(I2 X f2*,C), FF(w,i/) e L2(I2* x 12*, C). 
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A fundamental and important tool of the signal theory are time-shift and 
frequency-shift operators. They are defined as {T^ f){t) := f{t+r), {D^F){uj) := 
F{lo + v). For f{t) = and F{uj) = we have = 

^juiT^juit^ ^ j functions are eigen- 

functions of time-shift and frequency-shift operators T'^ and D'^ corresponding 
to eigenvalues Xr = lu £ [2* and X^, = t £ f2, respectively. We now 

generalize this result. 

Definition 1. The following operators (with respect to which all basis functions 



Tuj{t) 0'^£ invariant eigenfunctions) 

{T[ip^){t)= J Tf^if^{a)dp,{a) = ifu>{t)g)^{T), (3) 

crGt? 

{ff(p^){t)= J ff^Lp^{a)dp{a) = Lp^{t)(p^{T), (4) 

crSl? 

= J D((^ipa{t)dn{a) = (p^{t)if^{t), (5) 

{D 0 <pu){t) = J D((^ipa{t)dfi{a) = <pu>{t)ip^{t). (6) 

aGf2* 



are called commutative F -generalized ’’time” -shift and ’’frequency” -shift oper- 
ators (GSO’s), respectively, where <Pu:{t), Tuj{t) are eigenvalues of GSO’s T'’’ 
and T'’’ , respectively. 

For these operators we introduce the following designations: 

{Tf(p,^)(t) = ip„j(t ffl r), {ffip„,)(f) = ip„,{t B r), 

{D)f(pu){t) = (pu®u{t), D'f(p,j){t) = (pu>eAi)- 

Here the symbols ffl, B 0,0 denote the quasi-sum and quasi-difference, respec- 
tively. The expressions (3)-(6) are called multiplication formulae for basis func- 
tions (fiuj(t). They show that the set of basis functions form two hypergroups 
with respect to multiplication rules (3) and (5), respectively. We see also that 
two families of time and frequency GSOs form two hypergroups. 

For f{t) £ L(f?,C), F{oj) £ L(f2*,C) we define 

/(tBr):= j [F{uj)g},,j{T)](pu,{t)dp.{uj), /(tBr):= J [F{uj)ip,,j{T)](pu,{t)dp.{uj), 

F{ui®v):=J [f{t)ip,,{t)]ip,^{t)dp{t), F(w0i/):=y [f{t)qz,,{t)](f,^{t)dp.{t). 

tGO ten 
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In particular, it follows the first and second main theorems of generalized har- 
monic analysis: ’’theorems of generalized shifts and generalized modulations”, 
respectively 



fftBr) < — >F{u)ip,j{T), 


f{tBr) ^ 


' F{u)ip,,{T) 


(7) 










f{t)(pi,{f) < — > F(w 0 v), 




-> F{lo 0 v). 


(8) 






1.2 Generalized Convolutions and Correlations 

It is well known that any stationary linear dynamic systems (LDS) is described 
by the well-^known convolution integral. Using the notion GSO, we can formally 
generalize the notions of convolution and correlation. 

Definition 2. The following functions 

y{t) := (h<}x){t) = f h{T)x{tBT)dy{T), (9) 

y(w) := = J H{v)X{u> Q v)dy,{v) (10) 

are called the T-invariant time and D-invariant spectral convolutions (or <(>- 
convolution and ^-convolution), respectively. 

The spaces L(17, C) and L(17*, C) equipped multiplications <(> and form com- 
mutative Banach signal and spectral convolution algebras ((L(I2, C), <)))) and 
((L(17*, C), C’)), respectively. The classical convolution algebras ((L(l7, C), *)) 
and ((L(f?*, C), *)) are obtained if 17 is abelian group and (piu{t) are its charac- 
ters. 

Definition 3. The following expressions 

{f^ 9 )i.T):= J f{t)g{tBT)dy{t), (F^^G){v) := J F{uj)G(u:Qv)dy,{uj) (11) 

are referred to as T-invariant and D-invariant cross-correlation functions of 
the signals and of the spectra, respectively. 

The measures indicating the similarity between a tF~ distributions and Ft- 
distributions and its time- and frequency-shifted versions are their crosscorre- 
lation functions. 

Definition 4. The following expressions 

(fFJ|k4gG)(T, I/) := j J {F{t,u)]gG{tBT,u) e v)dy{t)dy{u}), (12) 
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(Ff4*Gg)(i/, r) := j J Ff{uj,t)Gg{uj Q v,tB T)dfi{t)dn{uj). (13) 

t6 J? 

are referred to as TD- and DT-invariant cross-eorrelation funetions of the dis- 
tributions, respectively. IffF(t,uj) = gG{t,uj) and Ff{uj,t) = Gg{uj,t) then the 
cross correlation functions are simply the autocorrelation functions. 

Theorem 1. Third main theorem of generalized harmonic analysis. 

Generalized Fourier transforms (1) and (2) map linear f^-and '^-convolutions 
and linear and ilk -correlations into the products of spectra and signals 

T{{h<?x){t)} = T {h{t)} T {x{t)} , .F{(/*5)(i)} = T {f {t)} T {g{t)} , 

{X(c^)} , 

T-^ {(F4G)(u;)} = T~^ {F(a;)} .^-HG(a;)}. 

Taking special forms of the GSO’s one can obtain known types of convolutions 
and crosscorrelations: arithmetic, cyclic, dyadic, m-adic, etc. Signal and spectral 
algebras have many of the properties, associated with classical group convolution 
algebras, many of them are catalogued in [l]-[2]. 

2 Generalized the Weyl Convolutions and Correlations 

2.1 Generalized the Weyl Gonvolutions and Gorrelations 

Linear time-invariant filtering may be viewed as an evaluation of weighted su- 
perpositions of time-shifted versions of the input signal 

+00 +00 

y{t) = {h * x){t) = J h{T)x{t — T)dr = J h{T){T*x){T)dT. (14) 

— OO — OO 

Every linear time-invariant filter is a weighted superposition of time- -shifts, and 
conversely, every weighted superposition of time-shifts is a linear time-invariant 
filter. 

The frequency convolution 

+00 +O0 

Y{uj) = {H * X){uj) = J H{v)X{uj - v)dv = J H{iz){Tf X){iz)diz, (15) 

— OO — OO 

is expressed as a weighted superposition of frequency-shifts versions of the spec- 
trum. 

We can combine expressions (14)-(15) into three time-frequency weighted su- 
perpositions of time-frequency shifts of the signal — r) , spectrum x 

xX{lo — v) and frequency-time distribution Xx(w — v,t — , respec- 

tively, which are called Weyl convolutions: 
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1. The Weyl convolution of a FTD Ww(i/, r) with a time-frequency shifted 
signal x{t — 



+ 00 +00 



y(t) = Weyl[Ww; cc](t) := J j 'W'w{v,T)x{t — T)e^''^dTdv. 



(16) 



— oo — oo 



2. The Weyl convolution of a TFD wW(t, i/) with a frequency-time shifted 
spectrum X{uj — 



+00+CXD 



Y{lo) = Weyl[wW; X](w) ■= j j wW(r, iy)X{co - v)e~^^^dTdv. (17) 



— OO — CXD 



3. The Weyl convolution of a FTD Wt(w, t) and Xx(w — v,t — r)e 

+ 00+00 

Weyl[Ww; Xx](w, f) := J J 'Ww{uj,t)'Xx{uj — di^dr. 

— 00 — 00 

(18) 

Convolutions (16)-(18) were proposed by H. Weyl [3]. Now they are called 
Weyl’s convolutions. The operators, Weyl[Ww; o], defined in (16) and (17) 
are time- and frequency- varying operators. The rules (16) and (18) which re- 
late time-frequency symbols wW(t,w) and Ww(w,t) two a unique operators 
y{t) = Weyl[wW;o] and T(w) = Weyl[wW;o], are called Weyl correspon- 
dences. It is easy to see that Weyl convolutions are invariant with respect to 
classical 2-D Heisenberg group consisting of all time-frequency shifts. The Weyl 
correspondence may be used to form linear time-varying filters in the two dif- 
ferent ways. In each case, the filter is defined by choosing a mask, or symbol, in 
the time-frequency plane. 

Analogously, we can introduce the following Weyl correlations: 

1. The Weyl correlation between a 2-D TFD fF(t, w) and a time-frequency 
shifted signal g(t — r)eA‘ : 



weylCor[fF,g](r, v) 



+ 00 +00 

J J 



T)e~^''*e^‘^^dujdt. 



— OO — OO 



(19) 



2. The Weyl correlation between a 2-D FTD Ff(u;,t) and a time-frequency 
shifted spectrum G{uj — : 



Weylcor[Ff; G]{v, r) 



+ 00 +00 



Ff(w,t)G(w - v)e^^'^e-^''*dtdoj. 



— OO — OO 



( 20 ) 
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3. The Weyl correlation between 2-D TFDs gG(t — t,uj — v)e>''*e ■1“'^ and 
fF(t,w) : 



+ 00 + CXD 

WEYLcor[fF;gG](i^,r):= J j fF(t, w)iG(t - r, w - 

— <50 — 00 

( 21 ) 

We generalize relations (16)-(18) for arbitrary pair of families of generalized 
shift operators. 



Definition 5. The following expressions are called T -generalized Weyl convo- 
lutions and are expressed as weighted superpositions of time-frequency shifts of 
the signal (puj{T)x(tEiT), spectrum (p^{t)X{ujQi') and frequency-time distribution 
Xx(u; 0 zy, t B T)(pi^{T)ip^{f), respectively: 

1. The Weyl convolution of a FTD Ww(i/, r) with a time-frequency shifted 
signal x(t B T)ip^(t) 



Weyl^ [Ww; cc](t) := 



J 'Ww{iz,T)x{tB T)ip^{t)dp.{T)diJ.{iy). ( 22 ) 



2. The Weyl convolution of a TFD wW ( t, v) with a frequency-time shifted 
spectrum X{u> O v)(p^^{t) 



WeyP[wW; X](u;) = J J 'w'W{t,v)X{oj Q v)ip^{T)dpL{T)dpL{v). (23) 

u£Q*t£C2 

3. The Weyl convolution of two FTD sX-x^ojOiy, tBT)Tp^(T) ip ^{t) andWt(u;,t) : 
Weyl[Ww; Xx](w,t) = 



Ww(w, t)Xx(w 0 t B T)p^{r)p,y{t)dp{iz)dfi{T) (24) 
(in particular, here can be Xx(u;,t) := X{to)x{f)). 

By analogy with (19)-(20) we can design generalized Weyl correlation func- 
tions. 

Definition 6. The following expressions are called T -generalized Weyl correla- 
tions: 

1. The Weyl correlation between a 2-D TFD fF(t, w) and a time-frequency 
shifted signal g{tB r)p,j{t)(pui{T) : 

weylCor*[fF;g](r, z/) := j J fF{t,ui)g{tB T)p,,{t)p:,j{T)dp{u;)dp{t). 




( 25 ) 
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2. The Weyl correlation between a 2-D FTD Ff(w,t) and a time-frequency 
shifted spectrum G{u> Q v)(p^{T)ip,j{t) : 

Weylcor^ [Ff; G](i^,r) := J j F{{uj,t)G{uj O v)(pui{T)ip^{t)dfj.{t)dp,{uj). 

(26) 

3. The Weyl correlation between two 2-D TFDs ^G{t'EiT,ujQ)ipv{t)tpuj{T) and 
fF{t,uj) : Weylcor*^[fF;gG](z/,r) := 

= j J fF{t,u;)gG{tBT,uj Q iy)(f^{t)ip:^{T)dfj,{uj)dfj,{t). (27) 



2.2 Generalized Ambiguity Functions and Wigner Distributions 

The Wigner distribution was introduced in 1932 be E. Wigner [4] in the con- 
text of quantum mechanics, where he defined the probability function of the 
simultaneous values of the spatial coordinates and impulses. Wigner’s idea was 
introduced in signal analysis in 1948 by J. Ville [-5], but it did not receive 
much attention there until 1953 when P. Woodward [6] reformulated it in 
the context of radar theory. Woodward proposed treating the question of radar 
signal ambiguity as a part of the question of target resolution. For that, he in- 
troduced a function that described the correlation between a radar signal and 
its Doppler-shifted and time-translated version. Physically, the ambiguity func- 
tion represents the energy in received signal as a function of time delay and 
Doppler frequency. This function describes the local ambiguity in locating tar- 
gets in range (time delay r) and in velocity (Doppler frequency v). Its absolute 
value is called uncertainty function as it is related to the uncertainty principle 
of radar signals. We can generalize this notion the following way. 



Definition 7. The T -generalized symmetric and asymmetric cross-ambiguity 
functions of two pairs of functions f,g and of F,G are defined by 



aF"[f,g]{T,iz) ■= T 



{fg'’(D^)}=y ‘Pu{t)dp.{t) 



Ar[F,G](i/,r) := jr-i|FG*(i/,w)| = y 0 0 G 0 0 ipu^{T)dfi{uj), 



UJ^ J?* 



aF“[/,g](r,z/) := jr |fg“(r, t)| = J f{t)g{tBT) ip^{t)dp{t), 

ten 

Ar[F,G](i/,r) :=^-i|fG“(i^,lj)| = f [f(l^)G(o; 0 i/)l (^^(r)dAi(w). 
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Definition 8. The T -generalized symmetric and asymmetric cross- 
Wigner FTDs Wd'*[/, ( 7 ](w, t), Wd“[/, 5 ](u;, t) and cross-Wigner TFDs 

wD'*[F, G](t, w), wD“[F, G](t, w) of two pairs of functions f^g and of F^G are 
defined by 



Wd'*[/,g](wG) := F {fg'’(Gi)}= f / ® 0 5 B 0 






wD*[F,G](t,L^) :=^-i{FG^(z/,o;)} = J [f(o; © 0 g(o; © 






Wd“[/,g](o;,t) := jf {i^°^{T,t)} = f{f)F{uj)g>^{f), 



wD“[F,G](t,u;) := {FG“(f,u;)} = F{co)f{t)g,^{t), 



^^-generalized asymmetrical Wigner distributions are called also F -generalized 
Richaczek’s distributions. 



3 High— Order Volterra Convolutions and Correlations 



The study of nonlinear systems y{t) = SYST[a;(t)] was started by Volterra [7] 
who investigated analytic operators and introduced the representation 

y{t) = SYST[a;(t)] = Volt[/ii,/i 2 ,.-.,/i 9 ,-.-;a:](t) = Volt[h;a;](t) = 



OO 



OO OO 



J hi{<7i)x{t — a\)dai + 



h 2 {(Ti,(J 2 )x{t — ai)x{t — a 2 )daida 2 + . . . 



— OO 



— OO — OO 



OO 

... J hq{ai,. . . ,<7q)x{t - (Tl)- • •x(t - CTq)dai ■ 



■ da a 



(28) 



— OO — OO 



where q = 1,2,...; signals x{t) and y{t) are the input and output, respectively, 
of the system SYST at time t, hq{ai,. . . ,aq) is the g-th order Volterra kernel, 
and the set of kernels h := {hi, / 12 , . . . , hg, . . .) is full characteristic of nonlinear 
system SYST. Equation (28) is also known as a Volterra series. 



3.1 Generalized Volterra Convolutions and Correlations 

By analogy with the classical high-order convolutions and correlations we intro- 
duce generalized T-stationary and Z)-stationary high-order convolutions and 
correlations. 
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Definition 9. The following expressions 






{hq<}'^x){t) := f ■■■ [ hq{ai,...,aq) 


q 

Y[x{t B ai)dfi{ai) 


J J 

(7i^f2 (Jq^Q 





(29) 



r f 








1 


Y<^PHaj) = {HpT)PX){w):= / ... / Hp{ai, 


. . . , CXp) 


0 aj)dfi{aj) 


J J 




Li=i 




J 

f 


r 


q 




1 


( 


{f*‘‘9){Tl,...,Tq) := / f{t) 




f) 


dfj.{t), ( 


J 


.i^l 


- 


1 




r 


r ^ 




1 




J.,) := / F{to) 


n G(w e nj) 


dp.{uj) ( 


J 


Li=l 




J 





are called the T -invariant q-th order time convolution (correlation) and D- 
invariant p-th order frequency convolution (correlation), respectively. 



Obviously, nonlinear operators 



y(t) = Volt'^[h;x](t) = 

9=1 

OO 

Y{uj) = Volt'^[H;X](w) = Y^{HpT>PX){io), 

OO 

Voltcor* [/; g] (n, r,, X! ('^i> • ■ • > '^ 9 )> 

9=1 

OO 

VoltCor^[F, .) := . . . , 

p=i 

one can call full T-invariant and D-invariant Volterra convolution and Volterra 
correlation operators, where h := {h\, ...,hq, ...), H := {Hi, Hq, ...). The T- 
invariant Volterra operators describe nonlinear T-stationary dynamic systems. 
The T-stationarity means the following. If y{f) is the output of such system 
for the input signal f{f), then signal y{t ffl s) will be the output for f{t ffl s) : 
y{t ffl s) = Volt^[h; x{t ffl s)]. Analogous statement is true and for D-invariant 
Volterra convolutions. 

Signal and spectrum can are processed not only separately but also jointly 
giving nonlinear Volterra TF and FT distributions as a result. 

Definition 10. The following TF and FT distributions 

yY^‘>P'> (t,u;) = {hUqp<f‘>xT)PX){t,co) := 
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= /.../ / ... / hH,p(ai,.. 


• ; ^q-) Ckl , • ■ 


• 5 ^p) 


q 

Y[x{t B ai) dfi{ai) 


J J J J 

(7 ol\^C 2* <y.p^C2* 









X 



p 

Q aj)d^{aj) , 

j=i 



( 33 ) 



Yy^P^\v,T) = := 



f... [ f ... fnhpqia^,.. 


• 1 ^p^ ^1 1 • ■ 


•)Cr,) 


P 

X{lo 0 aj)dp(aj) 


J J J J 

G Ctp G (7 1 G (7 q ^ i~2 






_i=i 



X 



q 

]^x(i B (7i) d/t((Ti) , 
j=i 



( 34 ) 



yY*'^(Ti,...,Tq;:^i,...,t'p) := (wW*«/4PG) {ti, ...,Tq,vi, ...,Vp) = 



vW{t,L0) 



teOujeO- 



n/(iBr.) 



G{iv 0 Vj) 

j=i 



d^{t)d^{ui) , ( 35 ) 



Yy{vi,...,Vp-,Ti,...,Tq) = (Ww4PF*«5)(i/i, n, ...,T,) := 



Ww(w, t) 



uj^n* tet? 



W_F{ujQVj) 

j=i 



n 5(^0 0) 



,i=l 



d^{uj)d^{t). ( 36 ) 



are called the p + q-th order TF and FT Volterra convolutions and correlations 
of signals and spectra, respectively. 

Adding TF and FT Volterra convolutions and correlations of all orders we obtain 
full TF and FT Volterra convolutions 



yY(t,w) = Volt 



<><:? 



hH; 



{t,u;) = Y,Y.^hR pq<)^X^PX){t,Uj), 

q=l p=l 



Yy{v, r) = Volt 






Hh; A, 



,pf?«A<>Px)(j/, r), 

p=i q=i 



Voltcor*^ [hH; f,G]{ri, . . . ,Tq, . . . . . ,Vp, . . 



OO OO 



= E E (hH*«/4^'G) {r,,...,Tq-v,,...,Vp), 

q=l p=l 



VoltCor^*[Hh; F,g]{vi,V 2 , . . . ,Vp, . . . . . ,Tq, . . .) := 
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P=1 9=1 

where hH := [hHp,], Hh := [Hhp,] are infinity matrices. 

Similarly, we can define TD-invariant Volterra convolutions of TF and FT 
distributions. 

Definition 11. The following functions are called the T D-invariant and DT- 
invariant p-th and q-th order Volterra convolutions of TF and FT distributions: 

yY^P\t,iv) := (hH,,(^f?)«xX)(t,cc) = 

J ■■■ J j ■■■ J hHr(CTi, . . . ,CT,; cti, . . . ,a,)x 



£T 1 G <7 r2 



xX(t B CTfc, w 0 a/c) dfj,{ak)dp{ak) 



,k=l 
ip) I 



(37) 



Yy^P>{n,T) := (Hhp(0^)PXx)(:., r) = 

j ■■■ j j " J Hh,(ai,...,ap;cri,...,(7p)x 



ai^f2* (Yp^f2* (7i^f2 (Jp^Q 

P 



X 



Xx(i/ 0 ctfc, t B ak)dp(ak) dp{ak) 



,fc=i 



(38) 



yY{Ti,...,Tg;iyi,...,np) = (fFqp*«4^’gG)(ri, . . . , r,; :/i, . . . , Up) := 

:=y J gQ{t , w)fF(t B Ti, . . . , t B Tq; W 0 . . . , W 0 Vp)dpL{f)dp{u!) , 

Yy(0,...,r'p;Ti,...,Tq) = (Ffpq4^*'?Gg)(i/i, . . . , r-p; n, . . . , r,) := 

:= y y Ff(w,t)Gg(a; 0 . . . , w 0 r-p; < B Ti, . . . , g B Ts)(i/t(w)(i/t(f). 

respectively, and 



yY{t,oj) = Volt^'^[hH;xX](t,w) = ^[hHp(C>^)PxX](t, w), 

p^l 

OO 

Yy{v,r) = Volt'^^[Hh;Xx](j.,T) = ^[Hhq(O0)«Xx)(:., r), 



Voltcor*^[fF,gG](n, . . . ,Tq, . . . ; . . . , r-p, . . .) = 
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9=1 P=1 

VoltCOR^’*[Ff, Qg]{vi, = 

oo oo 
p=l q=l 

will be called full T -generalized Volterra convolutions and correlations TF and 
FT distributions, respectively. 



3.2 Generalized Time^Frequency Higher Order Distributions 

In this section we propose a new higher order time-frequency distribution asso- 
ciated with arbitrary orthogonal Fourier transform: Higher-Order Generalized 
Ambiguity Functions (HOG-AF) and Higher-Order Generalized Wigner Distri- 
butions (HOG-WD). The HOG-WD is an g-th order uni-time/multi-frequency 
(UT/MF) distribution that is based on the g-th order time-varying moments of 
deterministic signals. 

Definition 12. The higher-order T -generalized multi-frequency /uni-time and 
multi-time/uni- frequency symmetric and asymmetric cross- Wigner distribu- 
tion are defined as the q-D Fourier T- transforms of a q-th order symmetrical 
and asymmetrical local cross- correlation functions, respectively. 



Wd®[/, 5 ](wi,...,a;,,t) := jr ••• jr {fg'*(n, . . . , r,, t)} = 

Tq^UJq 



-hi 


q 

f{tB{T))Y[{Gg){tB{r)mn) 


q 

W_Ti^i{Ti)dpL{Ti) 


Ti^Q Tq^fl 


i^l 





wB;[F,G]{h,...,tp,u;) := jz-i . . . jz-i {FG{n,, . . . ,,,p,co)} = 




1^1 G 12* 



J 

l/p G 



P 

F{uiQ {v))Y^CjG){ui 0 {v) 0 Vj) 
i=i 



p 

j=i 



where (t) := i”*) (^) •= centered time and frequency, 

respectively, Ci is the i- th conjugation operators (it conjugates the signal or 
spectrum if the index i is even) and 



Wd“[/, 5 ](wi,...,wq,t) := jp ••• jp <f{t)Y[g{tBn)[ = 



i=l 



= f(t) 




</ 2 wi©...©Wp {t), 
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wr>;[F,G]{h,...,tp,u) := \F{co)l[Giioei^j) \ = 

1^1 l^q ^tq 






= F{u 



Y[9(tj) 

i=i 



(Pu;{ti ffl ... ffl Tp). 



Definition 13. The higher-order F -generalized multi-time /uni- f re quency and 
multi-frequency /uni-time symmetric and asymmetric ambiguity function are de- 
fined as the 1-D Fourier T- transform of q-th order symmetrical and asymmet- 
rical local cross-correlation functions, respectively, 



3F‘l[f,g\{Tl,...,Tq,v) ■.= T {fg''(ri, . . . ,Tq,t)} = 



= / /(iB(r)) 



Y[iGg){tB{r)BT/ 



ip„{t)dp,{t), 



Afp[F, G](i^i, . . . , Up, t) := jP-i {FG'*(:^i, . ..,Vp, w)} = 



F{coe{n)) 






0 {v) 0 Vj) 

j=i 



(Pu;{T)dfJ,{uj), 



where (r) := F, {v) '■= TTj=i 0' centered time and frequency, 

respectively, and 



aF“[/,g](Ti, ...,r,,i/) := F {fg“(ri, . . . , r,, t)} = 



= I m 



q 

Y[g{tSTi) 



<py{f)dp.{f), 



A^l[F,G]{v^,...,yp,T) := 



j:-i {FG^{vx, . . . ,Vp,u)} = 



= I F{^) 



0 nj) 



f=i 



(p^{T)dfi{uj). 



Fig. 2 contains block diagrams relating different generalized higher-order (g+l)D 
distributions. We can obtain generalized Cohen’s class distributions as general- 
ized convolution of the Wigner distribution Wd(a;,t) with Co(a;,t) 



Ft(w,t) := 



Co(r 



)Wd(w Q I/,tB T)dp{v)dpL{T) . 



tgo 

The purpose of the Cohen’s kernel Co(o;,t) as in the classical case is to filter 
out cross terms and maintain the resolution of the auto terms. 
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gf(ri,...,Tg,t) 




Fig. 1. Diagram of relations between the different generalized higher-order 
(g + 1)D distributions 



4 Conclusion 

In this paper we have examined the idea of a generalized shift operator, associ- 
ated with an arbitrary orthogonal transform and generalized linear and nonlinear 
convolutions based on these generalized shift operators. Such operators permit 
unify and generalize the majority of known methods and tools of signal process- 
ing based on classical Fourier transform. 
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Abstract. A novel scheme for texture segmentation is presented. Our 
algorithm is based on generalizing the intensity-based geodesic active 
contours model to the Gabor spatial-feature space of images. First, we 
apply the Gabor-Morlet transform to the image using self similar Ga- 
bor functions, and then implement the geodesic active snakes mecha- 
nism in this space. The spatial-feature space is represented, via the Bel- 
trami framework, as a Riemannian manifold. The stopping term, in the 
geodesic snake mechanism, is generalized and is derived from the metric 
of the Gabor spatial-feature manifold. Experimental results obtained by 
applying the scheme to test images are presented. 

Keywords: Texture segmentation, Gabor analysis. Geodesic active con- 
tours, Beltrami framework, Anisotropic diffusion, image manifolds. 



1 Introduction 

Image segmentation is an important issue in image analysis. Usually it is based 
on intensity features, e.g. gradients. However, real life images usually contain 
additional features such as textures and colors that determine image structure. In 
order to achieve texture segmentation (detecting the boundary between textural 
homogeneous regions), it is necessary to generalize the definition of segmentation 
to features other than intensity. 

Since real world textures are difficult to model mathematically, no exact 
definition for texture exists. Therefore, ad-hoc approaches to the analysis of 
texture have been used, including local geometric primitives [8], local statistical 
features [3] and random field models [7,4]. A more general theory, based on the 
human visual system has emerged, in which texture features are extracted using 
Gabor filters [20]. 

The motivation for the use of Gabor filters in texture analysis is double folded. 
First, it is believed that simple cells in the visual cortex can be modeled by Gabor 
functions [16,5], and that the Gabor scheme provides a suitable representation 
for visual information in the combined frequency-position space [19]. Second, 
the Gabor representation has been shown to be optimal in the sense of mini- 
mizing the joint two-dimensional uncertainty in the combined spatial-frequency 
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space [6] . The analysis of Gabor filters was generalized to multi- window Gabor 
filters [23] and to Gabor-Morlet wavelets [19,23,17,12], and studied both analyti- 
cally and experimentally on various classes of images [23] . A first attempt to use 
the Gabor feature space for segmentation was done by Lee et al [13] who use a 
variant of the Mumford-Shah functional adapted to some features in the Gabor 
space. Our method differs from theirs in using the entire information obtained 
by the Gabor analysis and in using a different segmentation technique. 

In the last ten years, a great deal of attention was given to the ’’snakes”, 
or active contours models which were proposed by Kaas et al [9] for intensity 
based image segmentation. In this framework an initial contour is deformed 
towards the boundary of an object to be detected. The evolution equation is 
derived from minimization of an energy functional, which obtains a minimum 
for a curve located at the boundary of the object. 

The geodesic active contours model [2] offers a different perspective for solv- 
ing the boundary detection problem; It is based on the observation that the 
energy minimization problem is equivalent to finding a geodesic curve in a Rie- 
mannian space whose metric is derived from image contents. The geodesic curve 
can be found via a geometric flow. Utilization of the Osher and Sethian level set 
numerical algorithm [21] allowed automatic handling of changes of topology. 

It was shown recently that the Gaborian spatial-feature space can be de- 
scribed, via the Beltrami framework [22], as a 4D Riemannian manifold [11] 
embedded in IR®. Based on this Riemannian structure we generalize the inten- 
sity based geodesic active contours method and apply it to the Gabor-feature 
space of images. Similar approaches, where the geodesic snakes scheme is applied 
to some feature space of the image, were studied by Lorigo et al [14] who used 
both intensity and its variance for MRI images’ segmentation, and by Paragios 
et al [18] who generates the image’s texture feature space by filtering the image 
using Gabor filters. Texture information is then expressed using statistical mea- 
surements. Texture segmentation is achieved by application of geodesic snakes 
to obtain the boundaries in the statistical feature space. 

The aim of our study is to generalize the intensity-based geodesic active 
snakes method and apply it to the actual Gabor-feature space of images. 

2 Geodesic Active Contours 

In this section we review the geodesic active contours method for non-textured 
images [2]. The generalization of the technique for texture segmentation is de- 
scribed in section 4. 

Let C(q) : [0, 1] ^ IR^ be a parametrized curve, and let I : [0, a] x [0, b] IR'*' 
be the given image. Let E{r) : [0, oo[^ IR'*' be an inverse edge detector, so that E 
approaches zero when r approaches infinity. Visually, E should represent the 
edges in the image, so that we can judge the ’’quality” of the stopping term E by 
the way it represents the edges and boundaries in an image. Thus, the stopping 
term E has a fundamental role in the geodesic active snakes mechanism; if it does 
not well represents the edges, application of the snakes mechanism is likely to fail. 
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Minimizing the energy functional proposed in the classical snakes is generalized 
to finding a geodesic curve in a Riemannian space by minimizing: 

Lii = I E{\VI{C{q))\)\C'{q)\dq. (1) 

We may see this term as a weighted length of a curve, where the Euclidean 
length element is weighted by E(|V/(C(( 7 ))|). The latter contains information 
regarding the boundaries in the image. The resultant evolution equation is the 
gradient descent flow: 



= E(|V/|)fcN - (VE • N) N (2) 

where k denotes curvature. 

If we now define a function U, so that C = {{x,y)\U{x,y) = 0), we may use 
the Osher-Sethian Level-Sets approach [21] and replace the evolution equation 
for the curve C, with an evolution equation for the embedding function U: 



dU{t) 

dt 



IVt/piv (i5(|V/|)^) - 



A popular choice for the stopping function E(|V/|) is given by: 



E{I) 



1 

1+ |V/|2- 



(3) 



3 Feature Space and Gabor Transform 

The Gabor scheme and Gabor filters have been studied by numerous researchers 
in the context of image representation, texture segmentation and image retrieval. 
A Gabor filter centered at the 2D frequency coordinates {U,V) has the general 
form of: 



Hx, y) = g{x', y') exp(2Tri(Ux + Vy)) (4) 

where 

{x',y') = {xcos{(j)) + ysin{(j)),—xsin{(j)) + ycos{(p)), (5) 

and 

9(»^,!/) = 5^exp(-^-^) (6) 

where, A is the aspect ratio between x and y scales, ct is the scale parameter, 
and the major axis of the Gaussian is oriented at angle (p relative to the x-axis 
and to the modulating sinewave gratings. 

Accordingly, the Fourier transform of the Gabor function is: 

E[{u,v) = exp 27r^cr^((M' — U')^X^ + {v' — 



(7) 
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where, {u',v') and {U',V) are rotated frequency coordinates. Thus, H{u,v) is 
a bandpass Gaussian with minor axis oriented at angle (j) from the u-axis, and 
the radial center frequency F is defined by : F = U'^ + V'^, with orientation 6 = 
arctan(y/{7). Since maximal resolution in orientation is wanted, the filters whose 
sine gratings are cooriented with the major axis of the modulating Gaussian are 
usually considered {(j> = 9 and A > 1), and the Gabor filter is reduced to: 
h{x,y) = g{x' ,y')exp{2TiiFx'). 

It is possible to generate Gabor-Morlet wavelets from a single mother-Gabor- 
wavelet by transformations such as: translations, rotations and dilations. We 
can generate, in this way, a set of filters for a known number of scales, S, and 
orientations K. We obtain the following filters for a discrete subset of trans- 
formations: hmn{x,y) = a~™g{x\y'), where {x' ,y') are the spatial coordinates 
rotated by ^ and m = Q...S — 1. Alternatively, one can obtain Gabor wavelets 
by logarithmicaly distorting the frequency axis [19] or by incorporating mul- 
tiwindows [23]. In the latter case one obtains a more general scheme wherein 
subsets of the functions constitute either wavelet sets or Gaborian sets. 

The feature space of an image is obtained by the inner product of this set of 
Gabor filters with the image: 

y) — FraniFi U') T ^JraniFi U') — 1/) * y')- (8) 



4 Application of Geodesic Snakes to the Gaborian 
Feature Space of Images 

The proposed approach enables us to use the geodesic snakes mechanism in the 
Gabor spatial feature space of images by generalizing the inverse edge indicator 
function E, which attracts in turn the evolving curve towards the boundary in 
the classical and geodesic snakes schemes. A special feature of our approach is 
the metric introduced in the Gabor space, and used as the building block for the 
stopping function E in the geodesic active contours scheme. 

Sochen et al [22] proposed to view images and image feature space as Rie- 
mannian manifolds embedded in a higher dimensional space. For example, a 
gray scale image is a 2-dimensional Riemannian surface (manifold), with (x,y) 
as local coordinates, embedded in IR^ with {X, F, Z) as local coordinates. The 
embedding map is (A = a;,F = y,Z = I{x,y)), and we write it, by abuse 
of notations, as (x,y,I). When we consider feature spaces of images, e.g. color 
space, statistical moments space, and the Gaborian space, we may view the 
image-feature information as a A-dimensional manifold embedded in a A -|- M 
dimensional space, where N stands for the number of local parameters needed 
to index the space of interest and M is the number of feature coordinates. For 
example, we may view the Gabor transformed image as a 2D manifold with 
local coordinates (x,y) embedded in a 6D feature space. The embedding map 
is {x,y,9{x,y),a{x,y),R{x,y),J{x,y)), where R and J are the real and imag- 
inary parts of the Gabor transform value, and 9 and cr as the direction and 
scale for which a maximal response has been achieved. Alternatively, we can 
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represent the transform space as a 4D manifold with coordinates (x, y, 9, a) em- 
bedded in the same 6D feature space. The embedding map, in this case, is 
{x,y,9,a,R{x,y,6,a),J{x,y,9,a)). The main difference between the two ap- 
proaches is whether 9 and cr are considered to be local coordinates or feature 
coordinates. In any case, these manifolds can evolve in their embedding spaces 
via some geometric flow. 

A basic concept in the context of Riemannian manifolds is distance. For 
example, we take a two-dimensional manifold E with local coordinates (cti, (J 2 ). 
Since the local coordinates are curvilinear, the distance is calculated using a 
positive definite symmetric bilinear form called the metric whose components 
are denoted by o’ 2 )'- 

ds^ = gfi^da^da'' , (9) 



where we used the Einstein summation convention : elements with identical 
superscripts and subscripts are summed over. 

The metric on the image manifold is derived using a procedure known as 
pullback. The manifold’s metric is then used for various geometrical flows. We 
shortly review the pullback mechanism. More detailed information can be found 
in [22]. 

Let X : E ^ M he an embedding of A in M, where M is a Riemannian 
manifold with a metric hij and E is another Riemannian manifold. We can use 
the knowledge of the metric on M and the map X to construct the metric on E. 
This pullback procedure is as follows: 



= h.,{X{a\a^)) — ^, ( 10 ) 

where we used the Einstein summation convention, i,j = 1, . . . ,dim{M), and 
(T^, cr^ are the local coordinates on the manifold E. 

If we pull back the metric of a 2D image manifold from the Euclidean em- 
bedding space (x,y,I) we get: 



i9tj.iy{x,y)) 



(l + ll hly \ 

\ LIy l + ll)- 



( 11 ) 



The determinant of g^i, yields the expression : l + I^^ + Iy^ ■ Thus, we can rewrite 
the expression for the stopping term E in the geodesic snakes mechanism as 
follows: 

= 1+|V/|2 = det{g^,)- 

We may interpret the Gabor transform of an image as a function assigning 
for each pixel’s coordinates, scale and orientation, a value (W). Thus, we may 
view the Gabor transform of an image as a 4D manifold with local coordinates 
(x,y,9,a) embedded in IR® of coordinates (x,y,9,a,R,J). We may pull back 
the metric for the 4D manifold from the 6D space, and use it to generate the 
stopping function E for the geodesic snakes mechanism. The metric derived for 
the 4D manifold is: 
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= 



I l + + 

Rx^y “I” JxJy 



RxRy JxJy RxRq JxJ& RxRa JxJa 
RyRo H“ JyJo RyRa H“ JyJa 



1 + Ry + Jy 



RxRq “1“ JxJq RyRo H“ JyJo 
\r X Ra 4 “ JxJ(T Ry Ra 4 “ JyJa 



1 + RI + J2 
Rq Ra 4“ JqJ a 



RoRa 4“ JqJ a 

l + Rl + Jl 



( 12 ) 

The resulting stopping function E is the inverse of the determinant of 
Here is a function of four variables (x,y,9 and cr), therefore, we obtain an 
evolution of a 4D manifold in a 6D embedding space. 

Alternative approach is to derive a stopping term E which is a function of x 
and y only. One way to achieve this is to get the scale and orientation for which 
we have received the maximum amplitude of the transform for each pixel. Thus, 
for each pixel, we obtain: Wmax, the maximum value of the transform, 9max 
and amax ~ the orientation and scale that yielded this maximum value. This 
approach results in a 2D manifold (with local coordinates (x,y)) embedded in a 
6D space (with local coordinates (x, y, R{x, y), J{x, y), 6{x, y), a(x, y)). If we use 
the pullback mechanism described above we get the following metric: 



f \ f 1 4 - R^ -j- J^ -f- (T^ -f- 9^ RxRy 4 - JxJy 4 - (Tx<Jy 4 - 9x9y \ , , 

\RxRy 4 - JxJy 4 - CfxCfy 4 " 9x9y 1 4 " Ry 4 " JyCr^ 4 - 0 ^ J 

Again, we use the fact that the determinant of the metric is a positive definite 
edge indicator to determine E as the inverse of the determinant of g^i,. Here 
is a function of the two spatial variables only x and y, therefore, we obtain an 
evolution of a 2D manifold in a 6D embedding space. 



5 Results and Discussion 

Geodesic snakes provide an efficient geometric flow scheme for boundary detec- 
tion, where the initial conditions include an arbitrary function U which implic- 
itly represents the curve, and a stopping term E which contains the information 
regarding the boundaries in the image. Gabor filters are optimally tuned to local- 
ized scale and orientation, and can therefore represent textural information. We 
actually generalize the definition of gradients which usually refers to intensity 
gradients over (x, y) to other possible gradients in scale and orientation. This 
gradient information is the input function E to the newly generalized geodesic 
snakes flow. 

In our application of geodesic snakes to textural images, we have used the 
mechanism offered by [15] to generate the Gabor wavelets for five scales and four 
orientations in a frequency range of 0.1 — 0.4 cycles per pixel. We note that this 
choice is different from the usual scheme in vision, where there are four scales 
and at least six orientations in use. In the geodesic snakes mechanism U was 
initiated to be a signed distance function [2] . 
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Fig. 1. A synthetic image made up of 2D sinewave gratings of different frequen- 
cies and orientations 




Fig. 2. The stopping function E calculated by means of the 2D manifold metric 




Fig. 3. The stopping function E of the first image calculated by using the in- 
tensity based definition E{I) = 




Fig. 4. The resultant boundary 




Fig. 5. An image comprised of two textures are taken from Brodatz album of 
textures [1] 
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We present the results of the 2D manifold approach for a synthetic image: 
the original image, the resulting stopping term E and the final boundary de- 
tected. We present some preliminary results for the 4D manifold approach with 
a Brodatz image: the resultant E is projected on the X-Y plane for each scale 
and orientation. 



scale 2, orientation 2 




scale 3, orientation 2 




scale 4, orientation 2 




scale 5, orientation 2 




scale 2, orientation 3 




scale 3, orientation 3 




scale 4. orientation 3 




scale 5, orientation 3 




scale 2, orientation 4 




scale 3, orientation 4 




scale 4, orientation 4 




scale 5, orientation 4 




Fig. 6. The stopping function E for the Brodatz texture image, calculated by us- 
ing the 4D manifold metric. For full size images see the web-page: http:/ /www- 
visl.technion.ac.il/gaborsnakes 



The first image (Fig. 1) is a synthesized texture composed of linear combina- 
tion of spatial sinewave gratings of different frequencies and orientations. When 
the stopping term E is calculated using the 2D manifold metric, we obtain a 
clear picture of the texture gradients (i.e. where significant changes in texture 
occur) in the image (Fig. 2). So, our initial contour is drawn to the wanted 
boundary. As can be seen in figure (3), when E is calculated using intensity 
values only, E{I) = the texture gradients are not visible, and the re- 

sultant E will probably not attract the initial contour towards the boundary. 
Application of the geodesic snakes algorithm using the 2D manifold approach 
results in an accurate boundary, as can be seen in figure (4). 

When we consider the entire Gabor spatial feature space, the stopping term E 
is a function of four variables x, y, 9, and cr. In more complex (texture-wise) im- 
ages such as the Brodatz textures (Fig. 5), taken from [1], we may see the addi- 
tional information that can be obtained. In figure (6) we present E as calculated 
for five scales and four orientations; however, only the components containing 
significant information is presented in the figure. We can see that information is 
preserved through scales. The E function contains more information when it is 
calculated by using the 4D manifold approach than the E function obtained by 
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Fig. 7. The stopping function E used in the application of the proposed scheme 
to the Brodatz image using: (left) the 4D manifold approach incorporating spe- 
cific scale and orientation (right) the intensity based definition E{I) = 



using the intensity based approach (Fig. 7). In other words, we obtain a clear di- 
vision of the image into two segments, which differ in their texture, and thereby 
get information about the relevant edges. As our main goal is to determine the 
boundaries in the image, we may deconvolve E for each scale and orientation 
with an appropriate gaussian function in order to obtain better spatial resolu- 
tion. 

The proposed texture segmentation scheme applies the geodesic active con- 
tours algorithm to the Gabor space of images, while the original geodesic snakes 
implements intensity gradients. The implementation of the feature space of im- 
ages results in detection of texture gradients. We treat the Gabor transformed 
image as a 2D manifold embedded in a 6D space, or a 4D manifold embedded 
in a 6D space, and calculate the local metric on the manifold using the pull- 
back method. We then integrate the metric information to the geodesic snakes 
scheme. We have shown the feasibility of the proposed approach, and its advan- 
tages over the intensity geodesic snakes applied to multi-textured images. This 
is currently further extended by completing the application of geodesic snakes 
to a 4D manifold {x,y,6,a) embedded in a 6D space (R,J,x,y,0,cr), and by the 
application of both schemes to medical images. 
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Abstract. The Beltrami diffusion-type process, reformulated for the 
purpose of image processing, is generalized to an adaptive forward-and- 
backward process and applied in localized image features’ enhancement 
and denoising. Images are considered as manifolds, embedded in higher 
dimensional feature-spaces that incorporate image attributes and fea- 
tures such as edges, color, texture, orientation and convexity. To control 
and stabilize the process, a nonlinear structure tensor is incorporated. 
The structure tensor is locally adjusted according to a gradient-type 
measure. Whereas for smooth areas it assumes positive values, and thus 
the diffusion is forward, for edges (large gradients) it becomes negative 
and the diffusion switches to a backward (inverse) process. The resultant 
combined forward-and-backward process accomplishes both local denois- 
ing and feature enhancement. 

Keywords: scale-space, image enhancement, color processing, Beltrami 
flow, anisotropic diffusion, inverse diffusion. 



1 Introduction 

Image denoising, enhancement and sharpening are important operations in the 
general fields of image processing and computer vision. The success of many 
applications, such as robotics, medical imaging and quality control depends in 
many cases on the results of these operations. Since images cannot be described 
as stationary processes, it is useful to consider local adaptive filters. These filters 
are best described as solutions of partial differential equations (PDE). 

The application of PDE’s in image processing and analysis starts with the 
linear scale-space approach [23,8] which applies the heat equation by consider- 
ing the noisy image as an initial condition. The associated filter is a Gaussian 
with a time varying scale. Perona and Malik [10] in their seminal contribution, 
generalized the heat equation to a non-linear diffusion equation where the dif- 
fusion coefficient depends upon image features i.e. edges. This work paved the 

G. Sommer and Y. Y. Zeevi (Eds.): AFPAC 2000, LNCS 1888, pp. 319—328, 2000. 
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way for a variety of PDE based methods that were applied to various problems 
in low-level vision (see [17] for an excellent introduction and overview). 

The Beltrami framework was recently proposed by Sochen et al [13] as a 
viewpoint that unifies many different algorithms and offer new possibilities of 
definitions and solutions of various tasks. Images and other vision objects of 
interest such as derivatives, orientations, texture, sequence of images, disparity 
in stereo vision, optical flow and more, are described as embedded manifolds. 
The embedded manifold is equipped with a Riemannian structure i.e. a metric. 
The metric encodes the geometry of the manifold. Non-linear operations on 
these objects are done according to the local geometry of the specific object of 
interest. The iterative process is understood as an evolution of the manifold. The 
evolution is a consequence of a non-linear PDE. No global (timewise) kernels can 
be associated with these non-linear PDE’s. Short time kernels for these processes 
were derived recently in [15]. 

We generalize the works of Perona and Malik [10], Sochen et al [13] and 
Weickert [18] and show how one can design a structure tensor that controls the 
non-linear diffusion process starting from the induced metric that is given in the 
Beltrami framework. The proposed structure tensor is non-deflnite positive or 
negative and switches between them according to image features. This results in 
a forward-and-backward diffusion flow. Different regions of the image are forward 
or backwards diffused according to the local geometry within a neighborhood. 
The adaptive property of the process, that expresses itself in the local decision 
on the direction of the diffusion and on its strength, is the main novelty of this 
paper. 

2 A Geometric Measure on Embedded Maps 

2.1 Images as Riemannian Manifolds 

According to the geometric approach to image representation, images are consid- 
ered to be two-dimensional Riemannian surfaces embedded in higher dimensional 
spatial-feature Riemannian manifolds [13,5,6,5,7,16,14]. Let cr^, /r = 1,2, be 
the local coordinates on the image surface and let X®, z = 1 , 2 , . . . , m, be the 
coordinates of the embedding space than the embedding map is given by 



Riemannian manifolds are manifolds endowed with a bi-linear positive- 
definite symmetric tensor which constitutes a metric. Denote by (£',( 5 ^ 1 /)) the 
image manifold and its metric and by (M, (hij)) the spatial- feature manifold 
and its corresponding metric. The induced metric can be calculated by = 
hijd^X'‘dvXK The map X. : S ^ M has the following weight [11] 




( 1 ) 



E[X\gf,,,,h,j] 



/ 



d^<jV99^'^{dt.X^){d.X^)h,,{X) 



( 2 ) 



where the range of indices is /i, ;/ = 1 , 2 , and i,j = 1 , . . . , m = dim M , and we 
use the Einstein summation convention: identical indices that appear one up and 
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one down are summed over. We denote by g the determinant of and by 

{g^'^) the inverse of {g^iv)- In the above expression (Pa^/g is an area element of 
the image manifold. The rest, i.e. g^'^ {d^X^){d,jXphij{lC} , is a generalization 
of L 2 - It is important to note that this expression (as well as the area element) 
does not depend on the choice of local coordinates. 

The feature evolves in a geometric way via the gradient descent equations 



XI 



dX^ 



2y/g 5X1- 



( 3 ) 



Note that we used our freedom to multiply the Euler-Lagrange equations by 
a strictly positive function and a positive definite matrix. This factor is the 
simplest one that does not change the minimization solution while giving a 
reparameterization invariant expression. This choice guarantees that the flow is 
geometric and does not depend on the parameterization. 

Given that the embedding space is Euclidean, the variational derivative of E 
with respect to the coordinate functions is given by 

- ( 4 ) 



where the operator that is acting on X® in the first term is the natural generaliza- 
tion of the Laplacian from flat surfaces to manifolds. In terms of the formalism 
implemented in our study, this is called the second order differential parameter 
of Beltrami [9], or in short Beltrami operator. 



2.2 The Metric as a Structure Tensor 

There has been a few works using anisotropic diffusion processes. Cottet and 
Germain [2] used a smoothed version of the image to direct the diffusion, while 
Weickert [20,19] smoothed also the structure tensor V/V/^ and then manipu- 
lated its eigenvalues to steer the smoothing direction. Eliminating one eigenvalue 
from a structure tensor, first proposed as a color tensor in [3], was used in [12], 
in which the tensors are not necessarily positive definite. While in [21,22], the 
eigenvalues are manipulated to result in a positive definite tensor. See also [I], 
where the diffusion is in the direction perpendicular to the maximal gradient of 
the three color channels (this direction is different than that of [ 12 ]). 

Let us first show that the diffusion directions can be deduced from the 
smoothed metric coefficients g^i, and may thus be included within the Beltrami 
framework under the right choice of directional diffusion coefficients. 

The induced metric {gp,u) is a symmetric uniformly positive definite matrix 
that captures the geometry of the image surface. Let Ai and A 2 be the largest and 
the smallest eigenvalues of {g^v), respectively. Since (g^u) is a symmetric positive 
matrix its corresponding eigenvectors ui and U 2 can be chosen orthonormal. Let 

, then we readily have the equality 
( 5 ^,) = UAU^. 



U = (ui\u 2 ), and A = 



Ai 0 
0 A 2 



( 5 ) 
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Note also that 

{gn ^ = UA-^U^ = U ^ , (6) 

and that 

g = det{g^^) = A 1 A 2 . (7) 

Our proposed enhancement procedure will control those eigenvalues adap- 
tively so that only meaningful edges will be enhanced, where smooth areas will 
be denoised. 

3 New Adaptive Structure Tensor 

3.1 Changing the Eigenvalues 

From the above derivation of the metric g^i, , it follows that the larger eigenvalue 
Al corresponds to the eigenvector in the gradient direction (in the 3D Euclidean 
case: The smaller eigenvalue A 2 corresponds to the eigenvector per- 
pendicular to the gradient direction (in the 3D Euclidean case: The 

eigenvectors are equal for both g^^^, and its inverse g^'' , whereas the eigenvalues 
have reciprocal values. We can use the eigenvalues as a means to control the Bel- 
trami flow process. For convenience let us define A^ = ^. As the first eigenvalue 
of g^'^ (that is A^) increases, so does the diffusion force in the gradient direction. 
Thus, by changing this eigenvalue we can reduce, eliminate or even reverse the 
diffusion process across the gradient. 

What would be the best strategy to control the diffusion process via adjust- 
ment of the relevant parameters ? There are a few requirements that might be 
considered as guidelines : 

— The enhancement should essentially be with relevance to the important fea- 
tures, while originally smooth segments should not be enhanced. 

~ The contradictory processes of enhancement and noise reduction by smooth- 
ing (filtering) should coexist. 

— The process should be as stable as possible, though restoration and enhance- 
ment processes are inherently unstable. 

Let us define A^(s) as a new adaptive eigenvalue to be put instead of the 
original A^. We propose that this new eigenvalue will be proportional to the 
combined gradient magnitude of the three channels (colors) |V/i;| (that is A^ = 
A^(|V/i;|) in the following way: 

r 1 — (s/fc/)" ,0 < s < kf 

A^(s) = < a [((s — kb)/w)'^'^ — 1] ,kb — w < s < kb + w (8) 

I 0 , otherwise 



and its smoothed version: 
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Ai,(s) = Ai(s)*G,(s) (9) 

where \^Is\ = (A7i|V/ip) , * denotes convolution, and kf < kb — w. We 
chose the exponent parameters n and m to be 4 and 1, respectively. 

The new structure tensor has to be continuous and differentiable. In the 
discrete domain, (8) could suffice (although it is only piecewise differentiable), 
whereas (9) can fit the general continuous case. Other types of eigenvalue ma- 
nipulation with similar nature may be considered. 

The parameter kf is essentially the limit of gradients to be smoothed out, 
whereas kb and w define the range of the backward diffusion, and should assume 
values of gradients that we want to emphasize. In our formula the range is 
symmetric , and we restrain the width from overlapping the forward diffusion 
area. One way of choosing these parameters in the discrete case, is by calculating 
the mean absolute gradient (MAG). 

The parameter a determines the ratio between the backward and forward 
diffusion . Under the condition of a that renders the backward diffusion process 
to become too dominant, the stabilizing forward process can no longer avoid 
oscillations. One can avoid the evolution of new singularities in smooth areas by 
bounding the maximum flux resulting from the backward diffusion to be smaller 
than the maximum affected by the forward one. Formally, we say: 

max{sA(s)} > max {sA(s)} (10) 

s<kf kh-w<.s<.kb-\-w 



In the case of our proposed eigenvalue, we get a simple formula for a, which just 
obeys this inequality by: 

a = kf/2kb ,for any 0<w < kb — kf (11) 

In practical applications, this bound can be doubled in value without expe- 
riencing major instabilities. 

See [4] for elaboration on the forward and backward diffusion for signal en- 
hancement. 



3.2 The Algorithm 

The algorithm to implement the flow It = Agl for color image enhancement is 
as follows: 

1. Compute the metric coefficients g^,^. For the N channel case (for color N = 3) 
we have 



9^.1 



= S^ 



N 






k jk 



( 12 ) 
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2. Diffuse the 5^1, coefficients by convolving with a Gaussian of variance p, 
thereby 



= Gp* (13) 

For 2D images Gp = e~G^+y^'^/p\ 

3. Compute the inverse smoothed metric g^'^ . Change the eigenvalues of the 
inverse metric A^,A^, (A^ < A^), of {gP'') so that A^ = A^(s) and A2 = a, 
(a > 1). This yields a new inverse structure tensor that is given by: 

= = (14) 

4. Calculate the determinant of the new structure tensor. Note that g can now 
have negative values. In cases where the inverse eigenvalue A^ is zero, the 
structure tensor determinant should assume a large value M » 1. 

g = det(gp^) = A1A2 = 

. f l/aAi(s) , Ai(s) ^ 0 

\ M , otherwise 

5. Evolve the fc-th channel via the Beltrami flow 

i'^ = ^ (16) 

Remark: In this flow, we will not get imaginary values, though we have the 
term ^/g because in cases of negative g the constant imaginary term i = a/— 1 
will be canceled. 



3.3 Variations to the Scheme 

As the process involves inverse diffusion for enhancement - it is by definition 
not stable. To obtain a more stable process, which will denoise the image and 
preserve its edges, setting a = 0 will remove the inverse diffusion part, and leave 
us with a coherent denoising scheme. 

There are a few ways to increase regularity in this PDE-based approach. One 
can replace the proposed conductance coefficient Eq. (8) by the smoothed one, 
Eq. (9). As presented in the algorithm, convolving the metric with a smooth- 
ing kernel, before manipulating it, increases the stability of the process. It is 
possible also to smooth smaller scales in a noisy signal by preprocessing. As we 
enhance the signal afterwards, this smoothing process does not affect the end 
result that much and enables us to operate in an originally much noisier envi- 
ronment. Finally, operating in extremely noisy areas, when we know of the type 
of singularity, we can apply more pre-smoothing, and consider only the largest 
gradient within the backward diffusion range. 

We can substitute the dependency of A^ instead of on the gradient, on similar 
’’edge detectors”: 
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M = X/g) 


(17) 


or the smoothed version: 




M = M(Gp*g) 


(18) 


or on the original eigenvalue itself : 




Ai = AI(A^) 


(19) 



A local approach, that adjusts the parameters kf,kh,w to be of different 
values in different segments of the image, is currently investigated. 



4 Results and Conclusion 

The image feature enhancement procedure developed in the framework of ge- 
ometry, incorporates a nonlinear adaptive structure tensor that controls the 
enhancement process along gradients. In other words, the structure tensor is 
locally adjusted according to a gradient- type measure. Whereas for smooth ar- 
eas it assumes positive values, and thus the diffusion is forward, for edges it 
becomes negative and the diffusion switches to a backward (inverse) process. 
In this way we accomplish both of the conflicting tasks of local denoising and 
feature enhancement. 

In Figure 1 the left eye of the Mandrill image is shown, before and after the 
application of the adaptive Beltrami process. It depicts efficient denoising of the 
retina, with sharp edges somewhat enhanced. In Figure 2 a blurred and noisy 
Tulip photo is processed, enhancing the center of the flower while denoising its 
background. In a detail enlargement of the same image (the flower’s pattern 
in Fig. 3) one can see more clearly that the bright curly outline of the leaf is 
enhanced (brighter in its center), whereas smooth areas are denoised. 

[ For a closer look at the color images, please follow the web link: http://www- 
visl.technion.ac.il/belt- fab ] . 

Lastly, note that the general scheme can be easily degenerated into a coherent 
stable denoising scheme that preserves edges. 
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Original Beltrami-lensor-adaptive-local beta=40 ro=1 iter=70 



Fig. 1. Left - original eye image, right - enhanced and denoised eye image. 70 
iterations, [kf,kf,,w] = [0.5, 4, 2] * MAG, p=l 




Fig. 2. Left - original tulip, right - enhanced and denoised tulip. 40 iterations, 
[kf, kb, w] = [0.7, 5, 3] * MAG, p = 1 
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Abstract. Registration is a fundamental task in image processing. Its 
purpose is to find a geometrical transformation that relates the points of 
an image to their corresponding points of another image. The determi- 
nation of the optimal transformation depends on the types of variations 
between the images. In this paper we propose a robust method based on 
two sets of points representing the images. One-to-one correspondence is 
assumed between these two sets. Our approach finds global affine trans- 
formation between the sets of points and can be used in any arbitrary 
dimension fc > 1. A sufficient existence condition for a unique solution is 
given and proven. Our method can be used to solve various registration 
problems emerged in numerous fields, including medical image process- 
ing, remotely sensed data processing, and computer vision. 

Keywords: registration problem; matching sets of points 



1 Introduction 

There is an increasing number of applications that require accurate aligning of 
one image with another taken from different viewpoints, by different imaging 
devices, or at different times. The geometrical transformation is to be found 
that maps a floating image data set in precise spatial correspondence with a 
reference image data set. This process of alignment is known as registration, 
although other words, such as co-registration, matching, and fusion, are also 
used. Examples of systems where image registration is a significant component 
include aligning images from different medical modalities for diagnosis, matching 
a target with a real-time image of a scene for target recognition, monitoring 
global land usage using satellite images, and matching stereo images to recover 
shape for autonomous navigation [6,10]. 

The registration technique for a given task depends on the knowledge about 
the characteristics of the type of variations. Registration methods can be viewed 
as different combinations of choices for the following four components [6] : 
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— Search space is determined by the type of transformation we have to con- 
sider, i.e., what is the class of transformations that is capable of aligning 
the images. Some widely used types are rigid-body , when translations and 
rotations are allowed only, ajjine, which maps parallel lines to parallel lines, 
and nonlinear, which can transform straight lines to curves. 

— Feature data set describes what kind of image properties are used in match- 
ing. 

— Similarity measure is a function of the transformation parameters which 
shows how well the floating and the reference image fit. The task of regis- 
tration is to optimize this function. 

— Search strategy determines what kind of optimization method to use. 

Figure 1 explains the major steps of a general registration process. 




Fig. 1. Major steps of a general registration process. Feature data sets F\ and F 2 
are extracted from reference image I\ and floating image I 2 , respectively. Trans- 
formation T is calculated using Fi and F 2 . I 2 is aligned to I\ by applying T. A 
brand new image I 3 can be calculated by fusing Ii and T(/ 2 ) 



A general and robust solution for registration problems is selecting points as 
features. A general point-based method consists of three steps. First, the points 
are identified, then points in the floating image are corresponded with points 
in the reference image, finally a spatial mapping is determined. Point-based 
methods can be either interactive or automatic. Using an interactive point-based 
method, usually few pairs of points (4-20) are identified and corresponded by 
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the user. Methods of this type are available for rigid-body [1] and nonlinear [4,9] 
problems. Automatic determination of the features usually results huge amount 
of points. In this case finding correspondences can be rather difficult (e.g., the 
number of elements of the point sets is not necessarily the same) and require a 
special algorithm. Widely used methods are head-hat method [13], hierarchical 
Chamfer matching [2,5], and iterative closest point [3] method. These are used 
mainly for rigid-body problems, but extension to more general transformations 
is easy. 

In this paper we propose an interactive point-based method. 

2 AfRne Method for ARgning Two Sets of Points 

In this section we propose a robust method based on identified pairs of points, 
which assumes affine motion between the images. Let fc > 1 denote the dimension 
of the images and let n be the number of pairs of points. 

Our registration method is described by giving the following four components: 

— search space 

Global transformation described by a (fc -I- 1) x (fc -|- 1) matrix T of the form 
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is to be found. Given T and a point x = (xi, . . . ,Xfc) € IR^, the transfor- 
mation sends x to y = (j/i, . . . , yk) G K* if and only if (yi, . . . , j/^, 1)^ = 
T • (xi, . . . , Xfc, 1)^ holds for the corresponding homogeneous coordinates [8]. 
Notice that each affine transformation can be described this way (Fig. 2). 
This kind of transformation has k ■ {k 1) degrees of freedom according to 
the matrix elements to be determined. 

— feature data set 

A set of n reference points {pi,p 2 , ■ • ■ ,Pn}, Pi = {pn, ■ ■ ■ ,Pik) G IR*, and a 
set of n floating points {gi, 92 , ■ • ■ , Qn}, Qi = {qn, • ■ • , Qik) G IR^, are to be 
identified in the reference image and the floating image, respectively (Fig. 3). 
We assume that qi is corresponded to pi (1 < * < n). 

— similarity measure 

Suppose that we get point q^ = {flu, ■ ■ ■ , when point qi is transformed 
by matrix T (1 < i < n): 
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Fig. 2. Example of a 2D affine transformation: The original image (left) and 
the transformed one (right). Lines are mapped to lines, parallelism is preserved, 
but angles can be altered 



Define the function 5 of fc • (fc + 1) variables as follows: 

n n k 

i—1 z=l j—1 

n k 

■ Qil + ■ ■ ■ + tjk • Qik + tj,k+l — Pij)“^ ■ 

i=l j=l 

It can be regarded as the matching error. 

— search strategy 

The least square solution of matrix T is determined by minimizing func- 
tion S. Direct matching is applied. Function S may be minimal if all of the 
partial derivatives . • ■ , equal to zero. The required k-{k+l) 

equations: 
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(1 < w < k). 
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Fig. 3. Example of identified pairs of points in 2D. Eight pairs (pi, qi) of points 
are identified in the reference image (left) and in the floating image (right), 
respectively 



We get the following system of linear equations: 
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C-UV 



n 

^ ^ Piu ’ Qiv 1 
i=l 



n 

du — ^ ^ Piu 
i=l 



{I < U,V < k). 



The above system of linear equations can be solved by using an appropriate 
numerical method. There exists a unique solution if and only if det{M) yf 0, 
where 

/ Oil . . . a\k hi \ 
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3 Discussion 

In this section we state and prove a sufficient existence condition for a unique 
solution. 

By a hyperplane of the Euclidean space IR* we mean a subset of the form 
{a + X : cc S S'} where S is a (fc — l)-dimensional linear subspace. Given some 
points qi,. . . ,q„ in M*, we say that these points span IR* if no hyperplane of 
IR^ contains them. If any k + \ points from qi, ... ,q„ span IR* then we say 
that qi, ... ,q„ are in general position. 

Theorem. If gi, . . . , span IR* then det(M) yf 0. 

Proof. Suppose det(M) = 0. Consider the vectors Vj = (qij,q 2 j,...,qnj) 
(I < j < k) in IR", and let Vk+i = (1,1,...,!) € IR". With the notation 
m = k + I observe that M = ((vi,Vj}j where ( , ) stands for the scalar 

multiplication. Since the columns of M are linearly dependent, we can fix a 
{Pi,..., Pm) G IR”" \ {(0, . . . , 0)1 such that J^jLi ^ hold for i = 

1, . . . ,m. Then 



0 = ^A-0 = ^/3i^ PJ{v^,VJ) = P^(vi,Y (djVj) = 

1—1 1—1 

mm mm 

i—l i—1 i=l 

whence YllLifdi'^^i = 0- Therefore all the qj, 1 < j < n, are solutions of the 
following (one element) system of linear equations: 



PlXi H h PkXk = ~Pm- 
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Since the system has solutions and (/3i, . . . , /3m) ^ (0,.--,0), there is an i S 
{1, . ■ ■ , fc} with /3i 7 ^ 0. Hence the solutions of (1) form a hyperplane of IR^. This 
hyperplane contains qi, . . . , Now it follows that if qi, ... ,qn span then 
det(M) 7 ^ 0. Q.e.d. 

4 Estimating Registration Error 

Point-based registration might find imperfect matching due to the presence of 
error in localizing the points (note that points are often called fiducials). Maurer 
et al. [11] proposed three types of measures of error: 

— Fiducial localization error (FLE), which is the error in determining the po- 
sitions of the fiducials. 

— Fiducial registration error (FRE), which is the root mean square distance 
between corresponding points after registration. Note that point-based reg- 
istration methods minimize this error measure. 

— Target registration error (TRE) , which is the distance between corresponding 
points representing ROIs (range-of-interest) after registration. 

In real applications, only FRE is used, neither FLE nor TRE can be mea- 
sured. Both FLE and TRE would require the knowledge of the exact spatial 
positions of the point pairs and in case we knew these, we would use these as 
pairs of points. But point selection is always prone to some error. The question 
is: if FRE is zero, does it really mean that the registration result is perfect? The 
answer is no, in a sense that the goal of registration is actually not the matching 
of the points, but the images in which the points are selected. Thus, using FRE 
as the measure of registration accuracy may be unreliable in some cases. 

So, to what extent does the method tolerate the errors in selecting points, and 
how can we measure it? In real applications it can be estimated e.g., visually. In 
theory, we can make numerical simulations. In this case the exact spatial position 
of the points is well known, FLE can be modelled, and TRE can be calculated. In 
the last decade, investigations were focussed on TRE as a measure of theoretical 
accuracy of registration [7,12]. Note that each of these papers considers only 
rigid-body transformations. 

There are two important results concerning registration errors [12]: 

— Result 1. For a fixed number of fiducials, TRE is proportional to FLE . 

— Result 2. TRE is approximately proportional to l/-\/n with n being the 
number of fiducials . 

Fitzpatrick et al. [7] gave an exact expression for approximating TRE assum- 
ing rigid-body transformations, thus proving both Result 1 and Result 2. 

In this paper we examine the dependence of TRE for our affine method via 
using numerical simulations. 
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4.1 Model for Numerical Simulations 

Let M = {{x,y,z) | x, y, z S H, 0 < a:, j/, z < 256} be a cube-shaped region 
in the 3D Euclidean space. Let P = {pi,p 2 , ■ ■ ■ ,Pn} be a set of n points used 
for modeling the fiducials identified in the reference image, where pi G M (1 < 
i < n). A known affine transformation T^nown is chosen and the set R = 
{fi \ ri = Tknown'Pij * = 1, . . . ,n| is calculated. Set R is corrupted by an n- 
dimensional noise vector (yi, . . . ,/r„) whose components are random variables 
having tr-Gaussian distribution. This is used for modeling the FLE. The set 
Q = {qi I Qi = Ti + Pi, i = l,...,n| is constructed, where pair {pi,qi) of 
points can be regarded as a pair of corresponding fiducials used for registration. 
It is assumed that the FLE is identically zero in the base image. The set S = 
{ Sj \ Sj G Ai, j = 1,...,to| of m points is randomly selected to represent 
ROIs in the reference image. Note that the same m = 20 target points are 
used for our numerical simulations . Set S is also transformed to generate set 
of m points U = {uj \ uj = Tknown’Sj, j = 1, . . . , mj. The transformation Tfound 
is determined and it is applied to the set U to calculate the set of m points 
V — I Vj — Tfound 7 j — f 5 ■ • ■ 5 m}. 

TRE is formulated as follows: 



We repeated the iterations 10000 times. 

4.2 Results 

Figure 4 shows that TRE is proportional to FLE, for a fixed number of fiducials. 
Therefore, Result 1 holds for affine transformations, too. 

Figure 5 is to demonstrate how TRE depends on the number of fiducials, for 
a fixed FLE. Although Result 2 does not hold, it can be seen that the TRE is 
inversely proportional to the number of fiducials. 

5 Conclusions 

In this paper we proposed a method capable of finding affine transformations 
based on selected pairs of points. We gave and proved a sufficient existence con- 
dition for a unique solution. We examined the theoretical registration accuracy 
of this method using numerical simulations. 

In practice, we successfully use this method to register 3D MR brain studies, 
in which 12 anatomical landmarks were interactively identified. 
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Fig. 4. TRE (for the 20 target points) as a function of FLE for 10 fiducials. It 
is confirmed that TRE is proportional to FLE 



Affine Motion 




Number of Fiducials 

Fig. 5. TRE (for the 20 target points) as a function of the number of fiducials n, 
where a = 1 Gaussian distribution is used for modelling FLE. TRE is inversely 
proportional to n 
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Abstract. The paper develops three extended Kalman filters (EKF) 
for 2D-3D pose estimation. The measurement models are based on three 
constraints which are constructed by geometric algebra. The dynamic 
measurements for these EKF are either points or lines. The real monoc- 
ular vision experiments show that the results of EKFs perform more 
stable than that of LMS method. 



1 Introduction and problem statement 

The paper describes the design of EKFs which are used to estimate pose param- 
eters of known objects in the framework of kinematics. Pose estimation in the 
framework of kinematics will be treated as nonlinear optimization with respect 
to geometric constraint equations expressing the relation between 2D image fea- 
tures and 3D model data. 

The problem is described as follows. First, we make the following assump- 
tions. The model of an object is given by points and lines in the 3D space. 
Further we extract line subspaces or points in an image of a calibrated camera 
and match them with the model of the object. The aim is to find the pose of 
the object from observations of points and lines in the images at different poses. 
Figure 1 shows the scenario with respect to observed line subspaces. The method 
of obtaining these is out of scope of this paper. 

To be more detailed, in the scenario of figure 1 we describe the following 
situation: We assume 3D points {yi} and lines {Si}, i = 1,2,..., belonging to 
an object model. Further we extract points {hi} and lines {h} in an image of a 
calibrated camera and match them with the model. 

Three constraints can be depicted: 

1. Point-line constraint: A transformed point, e.g. Xi, of the model point r/i 
must lie on the projection ray given by the optical center c and the 
corresponding image point hi . 

2. Point-plane constraint: A transformed point, e.g. xi, must lie on the projec- 
tion plane P 12 , given by c and the corresponding image line /i. 

3. Line-plane constraint: A transformed line, e.g. ii, of the model line Si must 
lie on the projection plane P 12 , given by c and the the corresponding image 
line Zi. 

We want to estimate optimal motion parameters based on these three con- 
straints which formally are written [1, 2] in motor algebra [3, 4, 5] as 
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Fig. 1. The scenario. The solid lines at the left hand describe the assumptions: the 
camera model, the model of the object and the initially extracted lines on the image 
plane. The dashed lines at the right hand describe the actual pose of the model, which 
leads to the best fit of the object with the actual extracted lines. 



Point-line constraint: XxLi,^ — Lb^X^ = I{m\ — ni x x-i) = 0, 

Point-plane constraint: Pi 2 .X'i — X 1 P 12 = I{d\ -\- p\ • Xi) = 0, 

Line-plane constraint: iiPi 2 + ^* 12^1 = «i • Pi -b I{diU\ — pi x ni) = 0. 

In above equations, we denote the point X\ = 1-1- Ix\, the lines Lb^ = 
n\ -b Im\ and L\ = u\-\- Iv\ and the plane P 12 = -b Id\. More detailed 
derivation and interpretation of these constraints are described in [1, 2]. We 
use rotor algebra to describe points and their 3D kinematics and motor algebra 
to present lines and to model their kinematics. The reason we use rotor and 
motor algebra instead of matrix algebra is as follows. In EKF we define the state 
vector to be estimated is the parameter vector of rotation and translation. By 
rotor and motor algebra, there are 7 and 8 parameters, respectively. If we directly 
use matrix algebra, there will be up to 12 parameters (9 for rotation and 3 for 
translation). It is obviously that rotor or motor algebra will be more efficient. 
Moreover, using motor algebra we linearize the 3D Euclidean line motion model 
straightforwardly. 

There are several approaches of optimal pose estimation based on least square 
methods [6]. Our preference is to use EKF for pose estimation because of their 
incremental, real-time potential and because of their robustness in case of noisy 
data. The robustness of the Kalman filter results from the fact that stronger 
modeling of the dynamic model is possible using additional priors compared to 
usual LMS estimators. 

Because EKF means a general frame for handling nonlinear measurement 
models [7], the estimation of each considered constraint requires an individually 
designed EKF. The commonly known EKFs for pose estimation are related to 
3D-3D point based measurements. The only EKF for line based measurements 
has been recently published by the authors [3]. But also that one has to esti- 
mate the motion of a line from 3D-3D measurements in motor algebra and not 
from 2D-3D measurements as in this paper. Zhang and Faugeras [8] used line 
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segments in the frame of a point based standard EKF. Bar-Itzhack and Oshman 
[9] designed a quaternion EKF for point based rotation estimation. 

The paper is organized as follows. After introduction and problem statement, 
in section two we will present three EKF approaches for motion estimation. In 
section three we compare the performance of different algorithms for constraint 
based pose estimation. 

2 The Extended Kalman Filter for pose estimation 

In this section we want to present the design of EKFs for estimating the pose 
based on three constraints. Because an EKF is defined in the frame of linear 
vector algebra, it will be necessary to map the estimation task from any chosen 
algebraic embedding to linear vector algebra (see e.g. [3]), at least so long no 
other solution exists. We present the design method in detail for constraint no.l 
in subsection 2.1. The design results for constraints no.2 and 3 will be given in 
subsections 2.2 and 2.3, respectively. 

2.1 EKF pose estimation based on point-line constraint 

In case of point based measurements of the object at different poses, an algebraic 
embedding of the problem in the 4D linear space of the algebra of rotors 
which is isomorphic to that one of quaternions H, will be sufficient [4, 3]. Thus, 
rotation will be represented by a unit rotor R and translation will be a bivector 
t. A point r/i transformed to a?i reads 

xi = RyiR + 1. 

We denote the four components of the rotor as 

R = ro+ ri(T2(T3 + r2(T3(7i + r3(7i(T2. 

To convert a rotor R into a rotation matrix R, simple conversion rules are 
at hand: 

( rl+rl-rl- rj 2 (nr 2 + rors) 2(nr3 - ror 2 ) \ 

2 (nr 2 - rors) r^-rj+rj- r| 2 (r 2 rs + ron) 

2(rir3 + ror 2 ) 2(r2f3 - rori) + r| / 

In vector algebra, the above point transformation model can be described as 

xi = Ryi + t. 

The projection ray in the point-line equation is represented by Pliicker 
coordinates (ni,mi), where ni is its unit direction and mi its moment. The 
point-line constraint equation in vector algebra of reads 

fi = mi — ni X xi = mi — rii x (7?.yi -I- 1) = 0. 

Let the state vector s for the EKF be a 7D vector, composed in terms of the 
rotor coefficients for rotation and translation, 

s = ■ 

The rotation coefficients must satisfy the unit condition 

f2 = R^R - 1 = rg + - 1 = 0. 

The noise free measurement vector ai is given by the actual line parameters 
ni and mi, and the actual 3D point measurements yi, 

»i = (ni'^,mi'^,yi'^)'^ = {nii,ni2,ni3,mii,mi2,mi3,yii,yi2,yi3V . 
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For a sequence of measurements ai and states Si, the constraint equations 



®i) — 




/mi - rii X (T^-iYi + ti) 
VRi'^Ri-1 



relate measurements and states in a nonlinear manner. The system model in this 
static case should be 

Si+l = Si + ^i, 

where ^i is a vector random sequence with known statistics, 

E[Ci] = 0 , 

E[ClCk] = Qi^ik, 

where (5ik is the Kronecker delta and the matrix Qi is assumed to be nonnegative 
definite. 

We assume that the measurement system is disturbed by additive white noise, 
i.e., the real observed measurement al is expressed as 

Bj = Bi + rji. 

The vector is an additive, random sequence with known statistics. 



ElVi] = 0, 

EiViVk] = Wi^ik, 



where the matrix Wi is assumed to be nonnegative definite. 

Since the observation equation is nonlinear (that means, the relationship 
between the measurement al and state Si is nonlinear), we expand fi(ai,Si) into 
a Taylor series about the (bj, where al is the real measurement and 

is the predicted state at situation i. By ignoring the second order terms, we get 
the linearized measurement equation 



Zi — + ^j. 



where t - ^ , 

Zi = fi(aj,Si/i_i) Si/i_i 

/ m; - nl X (-^i/i _ ly; + ti/i_i) \ 



The measurement matrix 'Hi of the linearized measurement Zi reads 

0ix3, 



_ afi(a!,Si/i_i) _ , 

■ “ dsi~ 






5Ri 



where 
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5 Ri 



<^2 ds di 
-ds d2 -di 
—di di d2 



dl — 2(f(j/j_i)oJ/ii +%/i-l)3?/i2 ~ '^(i/i-l)2yi3)^ 
d2 = 2(r(j/j_i)iJ/^;^ + 

ds 2( V^2ji—l)2yn '^(i/i—l)iyi2 l)0?/i3)i 

di = 2(— + f(j/j_i)oJ/i 2 + %/i-l)l?/i3)- 



The 3 x 3 matrix C„' is the skew-symmetric matrix of n-. For any vector y, 
we have C„;y = n- x y with 

( 0 -n'3 n<2 \ 

<^n! = { n'i3 0 -n'ii ■ 

\-n'i2 n'a 0 J 

The measurement noise is given by 



= — <“■ —'> = — 



I 3 x 3 
0lx3 0ix3 



0lx3 



Vi, 



4x9 



where I3X3 is a unit matrix and is the skew-symmetric matrix of 

with 

%i-i = '^i/i - lyi + ti/i-i. 

The expectation and the covariance of the new measurement noise are 
easily derived from that of al as 



mi] = 0, 

EliU,] = V, = 



9 ai 



9 ai 



The EKF motion estimation algorithms based on point-plane and line-plane 
constraints can be derived in a similar way. We list the results below. 



2.2 EKF pose estimation based on point-plane constraint 



The projection plane P12 in the point-plane constraint equation is represented 
by (di,pi), where di is its Hesse distance and pi its unit direction. The point- 
plane constraint equation in vector algebra of reads 

dl — pi^( 7 ?.xi -I- 1 ) = 0 . 

With the measurement vector ai = (di,Pi^,yi^)^ and the same state vector 
s as above, the measurement Zi of linearized measurement equation reads 



Z; = 



_ l^i- Pr('^i/i - lyi + ti/i-l) 



yRi/i-iRi/i-i-l J ^ 

The measurement matrix Hi of the linearized measurement Zi now reads 




344 



Yiwen Zhang et al. 



'Wi = 



V 




The measurement noise is given by 

. - lYi + - i) ^ „ 

VO 0ix3 0ix3 

2.3 EKF pose estimation based on line-plane constraint 

Using the line-plane constraint, the reference model entity in [3j 4] is the 

Pliicker line 5i = ni + Im\. This line transformed by a motor M = R + IR' 
reads 

L\ = MS\M = Rn\R + I{Rn\R' + R'n^R + Rm^R) =u\+ Iv\. 

We denote the 8 components of the motor as 

M = R + IR’ 

= ro + ri7273 + r2737i + ?"37i72 + I{r'o + r['^2'l3 + f’W37i + f’37i72)- 

The line motion equation can be equivalently expressed by vector form, 

ui = 7?.ni, 

vi = Aa^ + Rmi, 



with /flu fli2 flisX 

A= I fl21 fl22 fl23 1 5 
\fl31 fl32 fl33 / 

flu = 2 (roro + r[n - r'2r2 - r'^rs), ai2 = 2 (rsro + r^ri + rir2 + r'ors), 

fli 3 = 2 (-r 2 ro + r'sn - r^r 2 + rir 3 ), 021 = 2 (-r'^ro + r'^n + r[r 2 - r^rs), 

fl22 = 2(r'(,ro - r{ri + r^r2 - r'^rs), 023 = 2(r[ro + r'^ri + r'^r2 + r^rs), 

fl 3 i = 2(r2ro + rgn + ror2 + r[r^), 032 = 2 (-r[ro - Von + r'sr2 + r' 2 r 3 ), 

fl 33 = 2 {r'oro - r[n - r'2r2 + r'^rz). 



The line-plane constraint equation in vector algebra of reads 
[ti\ ^ /Pi^ui ^ ^ /Pi^(^ni) 

yf 2 y V'^iui + vi X piy \^di7?.ni + (,4ni +7?.mi) X Pi 



= 0 . 



We use the 8 components of the motor as the state vector for the EKF, 
s = (ro,n,r2,r3,r'Q,r[,r'2,r's)'^ 



and these 8 components must satisfy both the unit and orthogonal conditions: 
f3=rl+rj+rl+rj-l = 0, 
fi = ror'o + nr[ + r 2 r '2 + rafg = 0. 

The lOD noise free measurement vector ai is given by the true plane param- 
eters d\ and pi, and the true 6D line parameters (npmi), 

»i = (di,Pi'^,ni'^,mi'^)'^ = {di,pii,pi2,pi3,nii,ni2,ni3,mii,mi2,mi3)'^. 



The new measurement in linearized equation reads 

/Pi^J'^i/i-inO ^ ^ \ 



Z; = 



d-iti/i _ in- -I- (Ai/i _ in- -I- 7ti/i _ iml) x pj 
- l^i/i-1 ~ 1 
V^i/i - l^i/i-1 



+ Ri^i/i-l- 



J 
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The measurement matrix T-Li of the linearized measurement Zi reads 



/ 






-pTX>^ , 



Or 



X4 






0lx4 



where ^ 



0Ri 



7?r' = 



and oX>R = -^ 7 -^ 



aR; 2 

The 3x3 matrix Cp'. is the skew-symmetric matrix of p-. 

The measurement noise is given by 

/ 0 nl'^'^i/;_l ^ Pi^'^i/i-l Oix3 

“ I ’^i/i-lRi d;"^;/; _ 1 — Cp'^/; _ 1 — Cp'"^;/; _ 1 j rji 

\ 02x3 02x3 02x3 02x3 



where C^. is skew-symmetric matrix of Vi, and Vi is defined as 
Vi = ^/i - inl -I- 'Ri/i _ iml. 

Having linearized the measurement models, the EKF implementation is straight- 
forward and standard. Further implementation details will not be repeated here 
[2, 7, 3, 8]. In next section, We will denote the EKF as RtEKF, if the state ex- 
plicitly uses the rotor components of rotation R and of translation t, or MEKF, 
if motor components of motion M is used. 



2.4 Some notes about the algorithms 

Here we will give some specific notes on EKF algorithms. 

The EKF algorithm requires an initial guess of motion not very far from 
the true one. One reasonable hypothesis is that the motion is “small”. So we 
can set the initial guess as “no motion”: si/o = (1 0 0 0 0 0 0)^ and 
Sj/o = (1 0 0 0 0 0 0 0)^ for RtEKF and MEKF, respectively. In 
experiments, the estimate converges rapidly from the initial guess to near the 
true one within 4 or 5 runs, but for a qualified estimation, more than 15 runs 
are required. 

In our experiments, we find, if the translation ||t|| > 1, the algorithms based 
on constraints no. 1 and no. 3 will frequently diverge. The reason is that these 
constraints contain cross product terms. Such situation can be analyzed by the 
equation of measurement noise In constraint no. 1, suppose we set the origin 
of the coordinate system at somewhere on the reference model. If the estimated 
translator ||ti|| > 1, then, (usually) ||yi|| > 1. That will directly cause the 
components of the covariance matrix Vi to be far greater than that of the original 
covariance matrix Wi. Such enlarged noise will easy make the EKF diverging. 
To solve this problem, we simply multiply the measurement function by a scalar 
as follows. At the beginning of the algorithm, we check the distance, ||mi||, of the 
projection line L;,j. If ||mi|| > 1, we can use a modified measurement equation 

fi/||mi|| = mi/||mi|| - rii x (72.iyi/||mi|| -|- ti/||mi||) = 0. 
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At the end of the algorithm we multiply the estimated translation t? = ti/||mi|| 
by 1 1 mi 1 1 to recover the true estimation ti. Similar analysis can be done for 
constraint no. 3. In case of constraint no. 3, we use the distance , d*, of projection 
plane Py, that means, divide f 2 i by d*. At the end of the algorithm, we multiply 

>t! >t! 

the estimated dual part R- , R; = R-/dj, by d* to recover the true estimation 

R!. 

3 Experiments 

In this section we present some experiments by real images. The aim of the ex- 
periments is to study the performance of the EKF algorithms for pose estimation 
based on geometric constraints. We expect that both the special constraint and 
the algorithmic approach of using it may influence the results. This behavior 
should be shown with respect to different qualities of data. 

In our experimental scenario we took a B21 mobile robot equipped with a 
stereo camera head and positioned it two meters in front of a calibration cube. 
We focused one camera on the calibration cube and took an image. Then we 
moved the robot, focused the camera again on the cube and took another image. 
The edge size of the calibration cube is 46 cm and the image size is 384 x 288 
pixel. Furthermore we deflned on the calibration cube a 3D object model. 

In these experiments we actually selected certain points by hand and from 
these the depicted lines are derived and, by knowing the camera calibration, the 
actual projection ray and projection plane parameters are computed. 

The results of different algorithms for pose estimation are shown in table 1 
In the second column of table 1 RtFKF and MEKF denote the use of the EKF, 
MAT denotes matrix algebra, SVD denotes the singular value decomposition of 
a matrix to ensure a rotation matrix as a result. In the third column the used 
constraints, point-line (XL), point-plane (XP) and line-plane (LP) are indicated. 
The fourth column shows the results of the estimated rotation matrix Tt and 
the translation vector t, respectively. The flfth column shows the error of the 
equation system. Since the error of the equation system describes the Hesse 
distance of the entities [1], the value of the error is an approximation of the 
squared average distance of the entities. 

In a second experiment we compare the noise sensitivity of the various ap- 
proaches for pose estimation. Matrix based estimations result in both higher 
errors and larger fluctuations in dependence of the noise level compared to EKF 
estimates. This is in agreement with the well known behavior of error propaga- 
tion in case of matrix based rotation estimation. The EKF performs more stable. 
This is a consequence of the estimator themselves and of the fact that in our 
approach rotation is represented as rotors. The concatenation of rotors is more 
robust than that of rotation matrices. 



4 Conclusions 

In this paper we present three EKF algorithms for 2D-3D pose estimation. The 
aim of the paper is to design EKFs based on three geometric constraints. The 
model data are either points or lines. The observation frame is constituted by 
projection lines or projection planes. Any deviations from the constraint corre- 
spond the Hesse distance of the involved geometric entities. The representation 
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Table 1. The results of experiment 1, depending on the used constraints and algorithms 
to evaluate their validity. 




Fig. 2. Performance comparison with increasing noise. The EKFs perform with more 
accurate and more stable estimates than the matrix based method. 
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of both rotation and translation is assumed as rotors and motors in even geo- 
metric subalgebras of the Euclidean space. The experiments show advantages of 
that representation and of the EKF approach in comparison to normal matrix 
based LMS algorithms. 
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